Defining an architecture

defining an architecture

In the previous section I reviewed the concept of a systems architecture and outlined some of the benefits thereof. Here, I'll offer some guidance on defining a sysems architecture.

I often say that there are a million ways to run a shop, half of which are right. In other words, each shop has its own way of doing things and there are many situational variables, so there are no hard and fast rules on what should and should not be part of a systems architecture. With that in mind I hope to at least offer some food for thought in this section. Even a debate on why these ideas are rubbish is better than no discussion at all.

Preparing for the future

An architecture should prepare your shop for growth and movement. Growth involves adding more end-users, applications, and machines. Movement involves shuffling applications between machines, or re-tasking machines for other work. Both are inevitable unless your company plans to shutdown next month.

A combination of standards and abstraction will help your architecture gracefully handle growth and movement. Standards help you idenfity something without prior knowledge of that specific piece. For example, you could include the word "prod" or just the letter "p" in all of your production machine hostnames, as well as some agreed-upon description. When someone announces that "nytradep07" is down you'll know that this is a production trading host and respond accordingly.

Abstraction, on the other hand, determines how people access their services. People often say they're connecting to a given host when they're really connecting to a service running on that host, such as a networked reporting tool or database instance. If people instead connect to DNS aliases (CNAMEs), then they're blissfully oblivious when that reporting tool moves from "nyrptp10" to "ldnrpt12" over the weekend. All they know is that the name they use, "rpt-acct-p" works just like it did last Friday. (Please refer to the sidebar, "CNAMEs as a connectivity abstraction," for more details on this technique.)

Does this seem like overkill? At first, yes. Since you're planning for the future, the architecture should feel at least a little over-engineered for the present day. Think of it as an oversized coat: as you grow, it will fit better. Try to imagine how it will feel after one or two years' use. Will it gracefully handle new ideas and applications that weren't installed at the time you designed it? -or will these additional pieces feel like tack-ons, so your entire shop will be a group of special cases?

SIDEBAR: CNAMEs as a connectivity abstraction

People tend to get attached to hostnames, and connect to those hostnames when they're really just accessing services hosted on those machines, such as networked applications or database instances. Sure, it's easy to use hostnames ... until something moves. Maybe the database instance has outgrown its current hardware and is shuffling to a new machine. This causes disruption for people who have been using the hostname to connect to that database instance.

A little DNS magic can head off this problem. I typically use DNS aliases (CNAMEs), paired with virtual network interfaces, to abstract the physical machine from the services it hosts. Each service gets its own CNAME that points to a virtual interface on the host machine. In turn, this creates several logical entry points to one physical host. As services move, only DNS changes. End users' bookmarks, application config files, and the like are none the wiser.

TODO: image/diagram showing how this works

(This technique works just as well for failover or disaster recovery, as well as for upgrades.)

Naming standards

Overall, a naming standard should convey something about the entity in question. A person familiar with the standards defined in the systems architecture should be able to understand the entity's purpose based on the name alone.

Naming guidelines vary by the type of entity, so I've broken them down into categories:

  • network subdomains
  • hosts
  • network interfaces
  • filesystems (logical volumes)
  • user accounts - application / daemon
  • user accounts - individual
  • daemon/app users
  • application components (web server, app server, DB server)

network subdomains

TODO
  • internal vs external domains
  • location-based subdomains
  • environment-related subdomains (dev, qa, production)

hosts

Hostnames are a funny lot. The way people get attached, you'd think they were naming children.

I've seen various naming conventions over the years: city names, country names, oceans and continents. Birds versus trees versus comic book characters. Let's not forget my personal favourite, the free-for-all.

From a support perspective it's easy to poke holes in the free-for-all: names are random, so you really have to know the deep details of the environment to get around. (This, by the by, fails the 3AM Rule I mentioned in the previous section.) So unless your shop has only three or four machines, well, the free-for-all is out.

What about the places-and-things conventions? They sound reasonable at first, right? Take a moment to consider what happens over time:

Names get scarce. No, really, I've seen this one happen. As the shop grows, you start reaching for more and more obscure names of birds or cartoon characters, to the point that the naming convention suffers under its own weight. Worse still is when your naming convention is based on some finite quantity, such as "US states." (For those readers unfamiliar with US geography, there are only fifty states.)

Support becomes a trivia contest. Say you're no expert on trees or geography, or maybe English isn't your native language. Every alert sends you racing for a dictionary or world atlas. "Hm, is Griffen a bird or something else?"

Alerts become needlessly funny, and perhaps cryptic. As do the follow-up e-mails describing the problems. "Today at 4PM, someone accidentally unplugged Snoopy, which took down the production accounting database."

These points lead me to my preferred naming convention: role-based and descriptive. That is to say, describe the machine's purpose and tack on extra info for good measure. If it's a production machine, put the letter "p" or the abbreviation "prod" in there, and use "d" or "dev" for development hosts. Location? Sure, add some designated city or state identifier. Add a number, too, because only the smallest of shops have just one of each type of machine.

When you're done you'll end up with names such as:

  • nycpwx001 - New York City, production, web server, externally facing, number 1
  • ldndap12 - London, development, app server, number 12
  • chifarmq25 - Chicago, compute farm, QA, host number 25
  • sfpc01n05 - San Francisco, production, cluster number 1, node number 5

(Of course, these are just examples and not hard-and-fast rules. It's up to you to determine your preferred hostname length and how to arrange the information conveyed therein.)

Such naming conventions can be a little sterile, perhaps even a little cold. As the shop grows beyond those initial ten or twelve machines, though, ask yourself whether you really want to keep track of the more "creative" names.

What about workstation names? I'll concede that, for a small group of technical staff, it's possible to get away with a near free-for-all on workstation names. A little customization won't hurt here. Just let me convince you to add some identifier to the name, such as a leading "wk-" for workstation or "sa-" for sysadmin, to make the hosts easily identifiable.

End-user workstations are another deal. Those fall under a support umbrella, so you'll want to standardize those names.

Should the hostname indicate the operating system? If there is a formal separation between different OS teams, then it's helpful to let the hostname reflect the OS. Still, keep this as simple as possible -- maybe include a "w" for Windows-based machines and a "u" or "x" for the *nix variants. To be more specific -- indicating whether it's Solaris, AIX, or Linux; discerning between Fedora, Red Hat, Ubuntu, Debian -- will just get messy.

How specific should names be? (read: single-role machines versus multipurpose machines) In my experience, shops tend to single-task Windows machines -- "this is an Exchange server," "this is a file server" -- so it makes sense to include the role in the name: "fs" for file servers, "dc" for domain controlers, and so on.

For the Unix and Linux family, on the other hand, shops vary on whether to single-task machines or make them multi-purpose. Larger shops tend to have deeper pockets and can trade hardware dollars for simplicity. Smaller shops are more likely to let a single machine host different types of services to get the most value out of their hardware. It's not unlike the theatre realm: a larger, well-funded production can afford to have one actor or actress for each role, whereas smaller productions may have a single person play multiple roles based on what characters have to be onstage at a given time.

There's no right or wrong here, but it helps to know the pros and cons. To single-task a machine makes the naming more straightforward but means you need more hardware to keep the machines pigeonholed. To multi-purpose a machine means you have to make the name more generic and therefore less informative. In this latter case, so long as you are using sevice-based CNAMEs (described above) you can still determine what runs on the machine without having to login; but your troubleshooting still requires that extra step.

network interfaces

Earlier, I described an connectivity abstraction that mixes DNS CNAMEs (aliases) and virtual network interfaces. Using such a solution means you'll need some way to name the machines' network interfaces. I use the following template:

{hostname}-{device}-{interface number}

For example, given a host "ldnpdb01":

  • ldnpdb01-eth0-03: device eth0, third virtual interface
  • ldnpdb01-eth2-01: device eth2, first virtual interface

(From this convention, it is assumed that there are no zero-numbered virtual interfaces.)

disks, filesystems, and mount points

In traditional Unix-like operating systems, people carve disk devices into partitions (slices) on which they build mountable filesystems. Disk devices have machine-friendly names, such as c0t4d3 or sdc. Partition names are alphanumeric identifiers such as s2 or simply the number 2, which means a unique name of a partition may be c0t4d3s2 or sdc2.

These names tell you about how the disk is attached, but little else. Faced with inspecting an anonymous disk or, even worse, deciding which partition to assign to a database for raw access, your best bet is to pass the work to someone less risk-averse and conveniently turn off your mobile phone that night.

By comparison, many modern operating systems include a storage abstraction called a logical volume manager (LVM) or just volume manager. (FOOTNOTE: For the remaining OSs, there may exist a commercial volume manager add-on.) Most volume managers do away with the notion of semi-anonymous disk devices and partitions in favor of disk groups (or volume groups, depending on the product) and volumes. A disk group represents a pool of disk devices, and you create volumes out of those pools. Build mountable filesystems on top of volumes.

Volume managers offer several benefits over traditional disk allocation. I'll describe some of those in a future section. For now I'll focus on the idea that you can assign meaningful, human-readable names to volumes and volume groups. In spite of this gift, people sometimes cave in to nostalgia and assign semi-anonymous names such as vg01 for a volume group and Logvol07 for a volume. (FOOTNOTE: it doesn't help when the OS installer chooses similar default names.)

Since LVM gives offers us a chance to assign more meaningful names, who are we to refuse? Prefer to name volumes for their purpose, and use a modifier to distinguish between mounted filesystems and raw volumes (used for swap and some database engines). Consider the following template:

{purpose}{separator}{type}

where:

  • {purpose} describes what this volume does. Perhaps it will be swap, or attached to a mount point. In the latter case the purpose could be some agreed-upon abbreviation of the mount point's path.
  • {separator} separates the purpose from the type. A period (".") is one option, though newer OS installers and other GUI tools may forbid that character. In that case, an underscore ("_") or hyphen ("-") will do.
  • {type} explains whether this is a mountable filesystem (say, "fs") or a raw volume ("vol")

Examples:

num mount point or purpose type volume name
1 /usr filesystem usr.fs
2 second swap area raw volume swap-02.vol
3 /home/local filesystem filesystem home-local.fs

Of course, this isn't the only possible template. Perhaps you'll prefer to make the role a prefix ("fs-usr," "fs-home_local"). It's more important that you define your own standard and stick with it, than to follow my examples verbatim.

One potential complaint here is that these names can get rather long, which makes it difficult to mount them manually. This is true. On the other hand, when I have to manually mount a filesystem that usually means I'm trying to inspect an anonymous disk or trying to remove unused volumes to liberate space. In either case I'm quite thankful to see names such as vgora01/ora-data-acct-01.fs or vgdata05/vmware-install-data-12.fs.

That said, I also acknowledge that longer names can still be painful, especially when you're in a situation where you have no copy/paste ability. (Think: reviving a crashed machine and you're in limited console mode.) In that case, see whether you can define a lexicon of abbreviations to keep names short. Again, so long as you standardize on the abbreviations you won't lose any information.

TODO: research FEdora 9 using UUID for /etc/fstab instead of mount points. is there a way to use volume names instead of UUIDs in fstab?

(this works well for raw disks, the order of which may change on each boot (e.g. sdc becomes sdd or even sde) but hurts the usefulness of naming volume groups and volumes in LVM.

user accounts - application / daemon

TODO length (character limits), purpose, sticking with default names for packaged products

user accounts - individual

TODO

daemon/app users

TODO

application components (web server, app server, DB server)

TODO

directories and filesystems

TODO

network topology

TODO

host build strategies

TODO

application strategies

TODO