Interpreting WWW statistics

I wrote this little article for an internal newsletter at the U.S. National Archives in 1994. Today it is laughably incomplete and rather outdated (for example, Prodigy still existed then :), but the basic principles are still true, and there are a lot of links to it, so here it is for your edification and/or amusement.

Just as a note, "CLIO" is what NARA called its web site back then.

Doug Linder


Interpreting WWW statistics

Doug Linder, Synetics

January, 1994


It's been said, rightly, that trying to draw conclusions from web server statistics is like trying to nail Jell-o to the wall.

As you look at the statistics, I cannot stress strongly enough that they should only be used as very general trends and not as gospel truth. These numbers could easily be off (on the low side) by significant percentages. They are not, by any means, "hard numbers."

Is that a fault with the software? The hardware? The operation of the system? No. The inaccuracy of the numbers is simply a byproduct of the way the Web functions. Even the most technically advanced sites have only a general idea of the amount and nature of the traffic on their servers.

Here's what can be truthfully said for sure:

Everything else is conjecture. Beyond lies the Twilight Zone.

General things to consider:

Google

The story behind this page (if anyone cares):

Way back in 1993, when the web was very young, I was a very young novice UNIX administrator working at my first UNIX job, at the National Archives and Records Administration (NARA). This was back in the days when having a separate person as "webmaster" practically hadn't been invented yet. There were maybe a couple hundred webpages in the world, and Netscape was still something distributed by college students, not even named Netscape yet (it was called Mosaic).

At that time I wrote a short piece about web statistics, because I had been getting many frantic inquiries about stats from higher-ups in the agency, each wanting to know exactly how many people were visiting our site, where they were from, what their favorite colors were, their average height, and so on. I got tired of giving the same speech over and over again, so like any good sysadmin I decided to do it right, once, and then point people to it. So I wrote the essay (or whatever you want to call it) below. It was never really intended for the public, only for internal NARA use. But it was put in the public area with the statistics anyway, as I figured no one would know it was there, and anyway, it was no secret.

I mentioned it in the internal newsletter and forgot about it for years. I left NARA and had several other jobs. One day in 1996 or so, when the web had obviously gotten much bigger, someone mentioned "egosurfing" to me. "Egosurfing" means doing web searches on your own name to see what turns up. It sounded like fun, so I tried it. There was a lot of dull stuff I expected to find - old messages in email-list archives, a FAQ I once wrote, and so on. After all, I've been using the internet since 1985 when I was a Freshman Comp. Sci. major, when Men Were Men and we didn't have no stinkin' "@" signs in our email addresses - if you wanted to send an email, you had to know every machine between yours and theirs, and write a "bang path"! You kids these days got it easy...

Oh, ah, anyways, where was I? Oh yes, the article. Surprisingly, several people had either referenced or directly quoted "Interpreting WWW statistics" in their web pages. I have no idea how they found out about it, through word of mouth or perhaps webcrawlers. Or maybe researchers, doing what researchers often do, has shared their sources with firends - because if there's one thing NARA has a lot of, it's academics and research types.

Amazed, bemused, and not a little flattered to see myself quoted, cited, and duplicated in full in quite a few places on the web, I chuckled and forgot about it. But every year or so since then I've gone on an "ego-surfing" expedition, just to see what stuff I've written and sent out into the net has washed up on which shores, been archived where, and what has disappeared, sunk to the bottom, never to be seen again. Eventually the article itself was discovered in some dusty corner of NARA's web site by whoever the current webmaster was, and either removed or radically changed in location.

Even so, though, this one silly little thing I wrote seven years ago - when 99.5% of Americans didn't even know what the "World Wide Web" was, and AOL was just starting to seem like a cool idea - kept popping up. Sometimes references would disappear, but more often new ones would crop up - in passing references in other articles, as quoted sections in other people's work on the subject, as a footnoted source, and sometimes even reprinted more or less in full (and it's from one of those places that I got this copy).

Naturally, I haven't worked at NARA in many years, and the addresses and links are all wrong, but I thought it would be fun to finally preserve it on my own site just the way it was, for "old times sake" as a bit of nostalgia for myself, and to provide a permanent link which I know will still be working if anyone is ever looking for this again. Please don't send email to 'webmaster@nara.gov' asking about it. And if you want to link to the article, please link to it here.

So here it is, one of those weird pieces of net.flotsam that evetually will surface and float around after you if you spend enough time on the Internet and occaisionally say something at least slightly useful. Today it's laughably incomplete and dated, but the general concepts are still the same.

Uh... enjoy. :) Doug Linder, May 9, 2000