Web Stats & What You Should Be Aware of Before Believing Webalizer, AWStats, Google Analytics, Alexa, Compete or Quantcast

With a new year here, webmasters are likely interested in how their sites did last year, and they may be thinking of how to measure things in the new.

Executive summary: You must understand the tools.  You cannot compare the measurement of one tool to another because they measure differently.  They are all flawed, but some are more flawed than others.

Things are always dicey in SEO land.  Yet, how many spout out numbers as though they are the truth, the whole truth and nothing but the truth?  Well, the reality is that the numbers produced by some of these tools are so bad that TechCrunch declared in August 2011 that “If You Cite Compete Or Alexa For Anything Besides Making Fun Of Them, You’re A Moron”.  Quite a rude statement, but it is actually polite in comparison to some comments.

First, however, let’s consider the server side tools available.  Generally, a hosting company will provide tools using either Webalizer or AWStats.  As devon.web.designers in “AWStats Vs Webalizer Vs Google Analytics Visitor Numbers” makes clear, Webalizer really is the weakest of the breed.

Webalizer works in a similar way to AWStats – in that it interprets server logfiles. In my tests it consistently reported more visitors than either Google Analytics, or AWStats. The main reason for this this that it doesn’t try by default, to differentiate between robot and human visits. So when people have been used to Webalizer, and then get Google Analytics, they are often perplexed by the steep (apparent) drop in traffic.

In English, whenever a search engine searches out a website for indexing, Webalizer will not even attempt to discount it.  AWStats and Google Analytics, however, will not count machine searches if they recognize them.

When I looked at my Webalizer stats, the traffic shown was almost double that of what Google Analytics was hinting at (I recently reconfigured it, as I hadn’t been running it in some time).

Now, both AWStats and Webalizer are done on the server.  Google Analytics is a different animal, as it works by putting a code snippet on the server and sending the results to Google Analytics.  This algorithm is being changed somewhat (and the details are hazy to me), but that is still how many do it today.  Google does a better job of recognizing search engine bots, so for my money it is more accurate.

However, GA isn’t perfect.  If running it is blocked by the user’s browser, then it will underreport visits.  If someone clears their cookies, it will overstate the unique visitors.

In comparison, I run SiteMeter on my sites.  SiteMeter showed just slightly more traffic than GA, so these things can have somewhat of an effect.  When comparing either to Webalizer, however, it was obvious that Webalizer was inflated.  I suspect that it must be counting plugin hits (perhaps even SiteMeter hits).

Now, GA used to give a PageRank, but they took that away finally.  People were using it as a metric for far more than Google ever intended.  Still, as the above devon article points out, Google Analytics is a marketing tool as much as a website analyzer.  That’s good, though.  You would hope that if it is good enough for marketers that it is accurate enough for other purposes.

However, if you want to compare your site to other sites, you need some outside ranking.  This is where sites like Alexa, Compete and Quantcast come in.

The unfortunate truth is that many advertisers use Alexa to gauge whether or to advertise on a particular website.  The really sad truth is that Alexa rankings are barely worth the electrons that display the results.  Alexa gets the majority of its data from users who install their toolbar.  Just how many people are going to do this?  Not only that, but some antivirus programs don’t like the toolbar and remove it.  To say Alexa’s accuracy is questionable is like saying the Niagra Falls has some water flowing over it.

When Compete came out, it was hopeful competition to Alexa.  Unfortunately, they too get much of their data from their toolbar, according to Hongkiat.com article “A Look Into: Popular (Public) Site Ranking Services”.

In contrast, Quantcast, like Google Analytics, relies upon data supplied by code snippets placed onto the site by the webmaster.  This makes a lot more sense than relying mostly upon data supplied by users who may or may not install a toolbar or even know what the toolbar is for.

While the downsides of users not waiting for code to load, clearing cookies, etc., are still in play, this is a lot less variable than relying upon user action to employ a toolbar.  In addition, the ranking can only be as accurate as the data coming in, and that means enough sites have to implement the code.  That’s probably not as huge of a concern as it sounds, and the likelihood that websites wanting accurate data implementing the code are a lot higher than the likelihood of users employing toolbars.

So, what can we say about all of this?  It strikes me that:

  1. Different tools measure different things in different ways.  Comparing the number of one tool to another may be comparing apples and oranges, or in some cases apples to bananas.
  2. Each tool has its weaknesses, and not many have real strengths.  Still, some are weaker than others.
  3. Tools that rely upon user actions and the type of browser they are using will most likely be the least accurate.
  4. You must understand the tool.  Tools like Webalizer count all traffic, which can be misleading if you are looking for how many humans are interacting with the site (but it still is valid to identify trends).