Getting started: the what and the why

1. Change *is* inevitable.

2. Change *may* be painful.

Some people address point #2 by trying to avoid point #1. Much like skating uphill, it doesn't take long for the natural way of things to regain control and you're back where you started.

Address point #2 by accepting and embracing point #1. Design your environments so they adapt to change, they take it in stride, even use change to their advantage (much like the use of an opponent's energy against him in judo). This design is what we call "systems architecture" ...

Why am I reading this?

Most pieces of this sort open with a definition or two to set the pace. We'll get to that in a moment. Before we get too far into my own rambling, though, I'd like to ask you a few questions to help you gauge your interest. Let's say that you have some involvement in day-to-day technology operations:

  • You get a call to check out a performance lag on given application. Without reaching for documentation, do you have any idea how or where to begin?
  • You login to what you think is a stray machine in the datacenter. Do you have a general idea of what the machine does or what is its place in your infrastructure, without having to look it up in your team's wiki?
  • One of your senior team members leaves on holiday, or leaves the company altogether in a huff. Does his absence create a tremendous void in your team's brain trust?
  • Generally speaking, do you have to check your shop's long-term folklore to understand just what the hell is going on?

If you answered in the affirmative to any of those questions, then chances are your shop's architecure is lacking. -or your shop completely lacks an architecture. Read on ...

What is systems architecture?

Having outlined how to identify a shoddy architecture, I think it's fair that I explain what is an architecture.

When we speak of buildings, an architecture is the underlying design. The architect translates ideas into blueprints, which guide construction crews on what sorts of materials to use and where to place them to create a home, a warehouse, or a skyscraper. While the architect has a general idea of how the building wil eventually be used, a worthwhile design is one that incorporates stability with some flexibility. (For example: it should be possible to convert certain spaces from one type to another, or merge spaces, without causing the building to disintegrat.) At the same time, it helps identify what changes would be problematic before the demolition crews arrive. (You can tear out some walls, but not that one because it's a load-bearing wall.)

We can draw some parallels and apply this definition to a technology infrastructure: a systems architecture is a set of conventions and policies that define how a shop's hardware, software, and end-users interact. In more detail, a systems architecture describes how you name your machines, where you install applications, and how applications share data.

Just as a building's architecture is stable yet flexible, a systems architecture defines a stable environment that can (within reason) not just survive future changes but support them.

Why do we need it?

To define and implement an architecture demands time and effort, so it's fair to ask why it's worth the effort.

In a word: simplicity. In another word: serenity.

In more detail:

It simplifies support

I have what I call the Three AM Rule. That is: whatever you're doing, ask yourself whether it will make sense at 3AM when the pager sounds (or the front-line support calls, or whatever). If not, it fails and you should re-think what you're doing to make it cleaner.

A worthy systems architecture helps your entire shop pass the 3AM rule.

It defines naming conventions, so you can readily identify hosts, DNS names, and (application-based) user accounts without having to go wake up the poor slob to set it all up in the first place. (Why would you want two people up in the middle of the night working on a problem?)

Having a solid, clearly-defined architecture actually reduces the amount of documentation you need to maintain your shop. Document the architecture itself, then document the exceptions. Anything else can be safely assumed to conform to standard. (-and when you uncover something that is both undocumented and non-standard, find the dope who put it in and give them the what-for.)

It limits the (negative) impact of change

Put another way, a systems architecture sets boundaries.

It's tough to enforce nonexistent standards. A solid systems architecture helps outline how applications are installed and may interact in your environmentment. You can use this as a litmus test for any new applications that come your way. Especially useful for applications built in-house, this test encourages the developers to adhere to some rules to make sure they play well with other pieces of your environment. That leads back to the first point of making your environment predictable and easier to support.

(As a side note: encouraging application developers and designers to stick with such standards, believe it or not, may make them better at what they do. They don't always realize it up-front, but hard-coded ports, hostnames, or paths make their lives more difficult, too.)

It embraces other change

Sooner or later, something will come along that fails your litmus test but must go in anyway. A solid systems architecure gives you a way to isolate and identify those rogue pieces so they don't needlessly trouble anyone.

Why is systems architecture such a tough sell?

Most technical people I've met agree with the points above. Then they ask how to convince the Powers That Be to let them uproot and redesign their infrastructure.

Difficult to quantify

Architecture is a tough sell because it's easy to quantify its up-front costs but difficult to assess its value. Put another way, it's difficult to correlate its implementation to its impact. The results of poor (or complete lack of) architecture can take months or even years to notice. At the same time the results of a solid, robust systems architecture are difficult for management to appreciate because they only see sysadmins not having a tough time holding the place together.

Given that mindset, many shops skimp on infrastructure planning and let the applications run wild. (Don't feel bad. Working on application teams, I've seen shops do the same thing with their custom code, letting unrealistic deadlines steer them away from solid, robust applications that then become tougher to maintain.)

To sell this one to management, keep track of the issues you encounter day-to-day, especially those that keep people up at night or cause the business to lose revenue. (A prime example would be that oddly-named machine that crashed last week. You know, the one running that customer-facing website that no one knew about.) Then demonstrate how the scenario could have played out in the context of a predictable, well-planned environnent. That will help management put numbers on, and see value in, a proper architecture.

Restraints? Why?

Architecture also gets in the way. Order and restraint tend to clash with deadlines, and the (quite valid) case of "business need" is often misused to cover poor decisions. When implementing a systems architecture, expect resistance from end-users, project managers, or application teams who insist it will completely derail their day-to-day efforts.

Similar to the previous point, it may help to show these people how a proper architecture can make their lives easier. Application teams are sometimes on the hook for support just like sysadmins; and chances are they've had trouble troubleshooting an issue because the alarms went off and they had no idea where to start looking. Once they understand, you've made a new ally who can help you sell the idea on others.

What about security?

People occasionally resist adopting a systems architecture precisely because it makes their shops predictable. That, they contend, makes them easier to infiltrate.

Easier? yes, but just mildly so. Easy? No. At least I hope not. Security through obscurity isn't much security at all: a would-be hacker won't need predictable hostnames to find your database hosts or your DNS servers when they can let a port scanner do the work for them. They take a tea break while nmap runs and, voilá, paydirt! Compare that to the hell your sysadmins endure trying to march through a quagmire of an obscure infrastructure.

The trick here is to explain to your security staff how a proper architecture makes their job easier: they now know where to apply their talents to keep the shop airtight. "Sure, I can find Fort Knox; but hell if I'm getting in.