Getting started: the what and the why

Posted August 24th, 2008 by Administrator
Comments Off

Getting started: the what and the why

1. Change is inevitable.

2. Change may painful.

Some people address point #2 by trying to avoid point #1. Much like skating uphill, it doesn't take long for the natural way of things to regain control and you're back where you started.

Address point #2 by accepting and embracing point #1. Design your environments so they adapt to change, they take it in stride, even use change to their advantage. (Consider judo, which uses an opponent's energy against him in order to take him down.) This design is what we call "systems architecture" ...

Why am I reading this?

Most pieces of this sort open with a definition or two to set the pace. We'll get to that in a moment. Before we get too far into my own rambling, though, I'd like to ask you a few questions to help you gauge your interest. Let's say that you have some involvement in day-to-day technology operations:

  • You get a call to check out a performance lag on given application. Without reaching for documentation, do you have any idea how or where to begin?
  • You login to what you think is a stray machine in the datacenter. Do you have a general idea of what the machine does or what is its place in your infrastructure, without having to look it up in your team's wiki?
  • One of your senior team members leaves on holiday, or leaves the company altogether in a huff. Does his absence create a tremendous void in your team's brain trust?
  • Generally speaking, do you have to check your shop's long-term folklore to understand just what the hell is going on?

If you answered in the affirmative to any of those questions, then chances are your shop's architecure is lacking. -or your shop completely lacks an architecture. Read on ...

What is systems architecture?

Having outlined how to identify a shoddy architecture, I think it's fair that I explain what is an architecture.

When we speak of buildings, an architecture is the underlying design. The architect translates ideas into blueprints, which guide construction crews on what sorts of materials to use and where to place them to create a home, a warehouse, or a skyscraper. While the architect has a general idea of how the building wil eventually be used, a worthwhile design is one that incorporates stability with some flexibility. (For example: it should be possible to convert certain spaces from one type to another, or merge spaces, without causing the building to disintegrate.) At the same time, it helps identify what changes would be problematic before the demolition crews arrive. (You can tear out some walls, but not that one because it's a load-bearing wall.)

We can draw some parallels and apply this definition to a technology infrastructure: a systems architecture is a set of conventions and policies that define how a shop's hardware, software, and end-users interact. In more detail, a systems architecture describes how you name your machines, where you install applications, and how applications share data. It is a set of conventions and policies that yield consistency and predictability, which in turn simplify support and growth. Just as a building's architecture is stable yet flexible, a systems architecture defines a stable environment that can (within reason) not just survive future changes but support them.

Why do we need it?

To define and implement an architecture demands time and effort, so it's fair to ask why it's worth the effort.

In a word: simplicity. In another word: serenity.

In more detail:

It simplifies support

I have what I call the Three AM Rule. That is: whatever you're doing, ask yourself whether it will make sense at 3AM when the pager sounds (or the front-line support calls, or whatever). If not, it fails and you should re-think what you're doing to make it cleaner.

A worthy systems architecture helps your entire shop pass the 3AM rule. It defines naming conventions, so you can readily identify hosts, DNS names, and (application-based) user accounts without having to go wake up the poor slob to set it all up in the first place.

(Why would you want two people up in the middle of the night working on the same problem?)

Having a solid, clearly-defined architecture actually reduces the amount of documentation you need to maintain your shop. Document the architecture itself, then document the exceptions. Anything else can be safely assumed to conform to standard. -and when you uncover something that is both undocumented and non-standard, find the dope who put it in and give them the what-for.

It limits the (negative) impact of change

Put another way, a systems architecture sets boundaries.

It's tough to enforce nonexistent standards. A solid systems architecture helps outline how applications are installed and interact in your environment. You can use this as a litmus test for any new applications that come your way. Especially useful for applications built in-house, this test encourages the developers to adhere to some rules to make sure they play well with other pieces of your environment. That leads back to the first point of making your environment predictable and easier to support.

As a side note: encouraging application developers and designers to stick with such standards, believe it or not, helps them do their jobs as well. Some of them don't realize it up-front, but hard-coded ports, hostnames, or paths make their lives more difficult, too.

It embraces other change

Sooner or later, something will come along that fails your litmus test but must go in anyway. A solid systems architecure gives you a way to isolate and identify those rogue pieces so they don't needlessly trouble anyone.

It reduces costs

This one is a big sell for management. If you work in a small enough company, chances are budget numbers are of your concern as well. Having a systems architecture reduces costs because it reduces headcount. A clean, predictable shop has fewer problems than a messy one. In turn, a shop that has fewer problems needs fewer people to maintain it. This is for two reasons.

For one, such a shop requires fewer experienced, senior-level people to keep it going. These people tend to be the most expensive on the team. (FOOTNOTE: True, an inexperienced person can cause lots of damage. But senior-level people take home the biggest paychecks, and that is a number easy to quantify.) With a clearly documented systems architecture your shop can focus on hiring fresh talent that is eager to learn.

For the fresh young talent, consider the idea borrowed from the business world. In his book The E-Myth Revisited, Michael Gerber explains that a critical element of the turn-key franchise -- that is, a franchise that can operate without its creator involved in the day-to-day activities -- is the operations manual. This manual describes precisely how the place works. Franchisees and their employees follow it to provide a consistent experience for the franchise's clients without having to think about it too much.

Reading between the lines, and applying it to running a technology shop, I see the operations manual (here, the architecture) makes your senior staff's decisions repeatable by those who don't have their depth of experience. That is tremendous leverage, and it makes your senior people happy as well because it gives them an opportunity to move on to higher-level, more interesting work with little risk of being pulled into the day-to-day.

The second cost savings is in reduced human intervention. An architecture defines standards, and standards are the core of automation. Consider the benefits of using an automated build system, like RedHat KickStart, versus building by hand: you get the same build-out, every time. Now imagine that level of predictability going into some home-grown systems management tools. More automation leads to even fewer people required to manage the shop, and gives the less-experienced staff even less room to err.

The architecture, both as an operations manual and a standard of in-house automation, takes far-reaching decisions out of the hands of those unfit to do them. While this sounds harsh it is painfully true, especially if you have inherited or are otherwise responsible for an env built by untrained hands and a wandering mind.

It helps with M&A

That is, mergers and acquisitions. If your company is very large, chances are it will eventually buy another company. Having a clear systems architecture will help you plan how to integrate the purchased company's systems into your own. You certainly can't absorb and manage someone else's shop if your own house is not in order.

Likewise, if your company is very small, there is a chance it will be bought by a larger player. Expect the buyers to integrate your systems into theirs, in which case having a clean, ready-to-move shop may help woo or at least placate a buyer. ("Really, I don't have to spend mid six figures to integrate your systems into ours?")

Why is systems architecture such a tough sell?

Most technical people I've met agree with the points above. Then they ask how to convince the Powers That Be to let them uproot and redesign their infrastructure. To that I say, you have to know why people would resist.

Difficult to quantify

Architecture is a tough sell because it's easy to quantify its up-front costs but difficult to assess its value. Put another way, it's difficult to correlate its implementation to its impact. The results of poor (or complete lack of) architecture can take months or even years to notice. At the same time the results of a solid, robust systems architecture are difficult for management to appreciate because they only see sysadmins not having a tough time holding the place together.

Given that mindset, many shops skimp on infrastructure planning and let the applications run wild. (Don't feel bad. Working on application teams, I've seen shops do the same thing with their custom code, letting unrealistic deadlines steer them away from solid, robust applications that then become tougher to maintain.)

To sell this one to management, keep track of the issues you encounter day-to-day, especially those that keep people up at night or cause the business to lose revenue. Then demonstrate how the scenario could have played out in the context of a predictable, well-planned environnent. That will help management put numbers on, and see value in, a proper architecture. ("Remember that oddly-named machine that crashed last week? You know, the one running that customer-facing website? Apparently, only the people in the accounting department knew about its existence until that crash.")

Say your staff spends 10% of its troubleshooting time just finding things and understanding how they're configured, before they even get to solving the problem. Not only does your business lose 10% more revenue for each client-facing downtime episode; but any hourly-wage contractors just got an invisible 10% pay bump.

Adds Restraints

Architecture also gets in the way. Order and restraint tend to clash with deadlines. When implementing a systems architecture, expect resistance from end-users, project managers, or application teams who insist the extra boundaries will completely derail their day-to-day efforts.

Similar to the previous point, it may help to show these people how a proper architecture can make their lives easier. Application teams are sometimes on the hook for support just like sysadmins; and chances are they've had problems troubleshooting an issue because the alarms went off and they had no idea where to start looking. Once they understand, you've made a new ally who can help you sell the idea on others.

What about security?

People sometimes resist defining a systems architecture precisely because it makes their shops predictable. That, they contend, makes them easier to infiltrate.

Easier? yes, but just mildly so. Easy? No. At least I hope not. Security through obscurity isn't much security at all: a would-be hacker won't need predictable hostnames to find your database hosts or your DNS servers when they can let a port scanner do the work for them. They take a tea break while nmap runs and, voilá, paydirt! Compare that to the hell your sysadmins endure trying to march through a quagmire of an obscure infrastructure.

The trick here is to explain to your security staff how a proper architecture will make their job easier: they'll know precisely where to apply their talents to keep the shop airtight.

Sure, I can find Fort Knox; but hell if I'm getting in.

Moving forward

I'll get to technical details in the next chapter. But first, some soft skills.

Is my shop too small for an architecture?

The payoff for having an architecture grows geometrically based on the size of the shop and the scale of the project. If you're a two-person banging out a proof-of-concept then you certainly don't want to get too caught up in defining a systems architecture. As you start to target deployment, and as you bring more people in (especially if they are geographically disparate), having architecture becomes much more important. It reduces needless conversation ("where's the QA instance of the app running?") when you are very short on time.

As your product builds momentum, though, it's all too easy to postpone implementing an architecture because other matters call for your attention: building a client base, managing sales, and so on. The sweet spot here is to try to establish your systems architecture just the product starts to take off. Do this too soon and you risk never taking off. Do it too late and you risk embarrassing public failures when you're popular. Bad press is as plentiful as it is free.

This "architecture" concept sounds familiar, have I heard of it before?

More than likely, you've heard some of this before. I certainly don't claim the idea of a systems architecture as my own -- I borrowed it from other work I have done:

Having worked in or around software teams for so long, I came to adopt some of their practices to systems administration. One of these ideas was the notion of The Technical Lead, a role that somewhat overlaps with (or even morphed into) The Architect.

The architect's responsibility was to make sure all the pieces fit together so that, when the product was set to launch, it wouldn't collapse under its own weight or fizzle. This person interacted with both the business side (because why would the technology be there, if not for some business purpose?) and the technology side, making sure the former's needs were met in a way that didn't incude undue chaos in the latter's. Software teams had this down pat early on because they treated their projects at products with measurable goals for which they would need direction.

In my experience the sysadmins in these same shops didn't have that approach to technical leadership. Sure, there may have been a well-respected senior-level person who was called when all the chips were down. Maybe that person would talk to vendors about getting their products in the door. But that was it. The idea of having one person set direction and standards took a while to catch on, and until then there was some rampant individualism in the shops I visited. Each team member set things up "their way," so that machines were at best mildly consistent and at worst completely different ... much to the consternation of the machines' end-users who were trying to do their own jobs. Also, much to the frustration of fellow sysadmins who were called to address problems on machines they hadn't setup.

SIDEBAR: Real-world examples

For some real-world examples of architecture and planning, consider two tourist attractions: the Seattle Public Library's Central location and the infamous Winchester Mystery House in San Jose, California.

First, the Winchester Mystery House. This is a sprawling compound that, according to legend (well, my tour guide), was built according to the whims of Mrs Winchester. She had no up-front plan, and every day gave the hired builders instructions on what she wanted. There are narrow, winding passageways, the structure seems to change with each passing space, and there is even a door that leads nowhere. The building plan changed frequently, as did the builders: Mrs Winchester reportedly fired anyone who offered advice contrary to her ideas, even those experienced in construction and design.

Compare that to the Seattle public library: while wowed by its aesthetics and open spaces, I was also intrigued by something beneath my feet: as I wound my way through each floor of the Books Spiral, I noticed numbered floor mats. The numbers referred to Dewey Decimal collections numbers of the books housed on that floor. The perk? These mats are secured in place but not permanently affixed. That makes it less costly for the library to expand or move its collections, since relocating the tiles doesn't require tearing up the floors. Pick up the numbered floor mat, move, put it place, back to business as usual.

Both the library and the Winchester house are tourist attractions in their own right. The difference? A trip to the Winchester House includes a tour that mocks its lack of planning and design. Visiting the library, you can see that it was designed from the start to embrace future changes.

Winchester Mystery House:
http://www.winchestermysteryhouse.com/

Seattle Public Library, Central location:
http://www.spl.org/default.asp?pageID=branch_central_building_interior&branchID=1

Comments are closed.