Archive for May, 2007

Making sure the spiders can get in

Friday, May 25th, 2007

One of the most common causes of a site having poor search engine rankings is the search spiders not being able to penetrate part or all of the site. If the site isn’t fully indexed then it can’t work cohesively to provide relevance via its links and the most important pages may be missed altogether.

There are various things you need to make sure are in place to enable search engine spiders to properly index the site.

1. Link to the site

Make sure they can find it - you need a link from a site which is itself regularly spidered and preferably has some degree of rankings for its own keywords. Forget about submitting the site, and particularly ignore all those offers to auto submit it to ‘5000 search engines’ or the like. Even if there were 5000 search engines it would do more harm than good.

2. Provide useful content

Usually the home page will be the first page a spider will find. Make sure it finds something worthwhile. Dont give it a splash page with no content or it may decide there’s nothing worth looking at. Flash looks like no content, so does an image or some other kind of animation So does a frameset containing another site. Show it good quality text saying what the site is about, and provide appropriate meta information about that page - not the rest of the site.

3. Spider-friendly navigation

Now that it’s found some content make sure it can get to the rest of it. Good text based navigation is a must. Text links aren’t just easy to follow, they also pass relevance. Image based buttons can be followed if the link is formed correctly but pass no relevance on themselves and very little on the alt text compared to a text link. That doesnt mean the links can’t look like buttons some of the best ‘buttons’ are CSS styled text links.

If your site uses forms extensively then make sure there is also a way of bypassing them for the spiders to follow. They can’t fill in forms or select from a drop-down list, so those jump menus are no use for them either

4. Provide friendly URLs and filenames

If your site is based on one of the common ecommerce packages or is database driven then you need to ensure that the page URLs that are produced are friendly for the search engines. Session ids and long complex query strings with multiple parameters can cause the spiders to avoid the pages aimed at. If you have an existing site with these problems then you need to consult a developer or an SEO with experience of these issues.

Cover these basic requirements and you will already be well on the way to achieving good results - assuming of course that you have good content!

Google changes search results but is it for the better?

Friday, May 18th, 2007

Yesterday word started filtering through from America of some fairly fundamental changes in the way Google display search results. This morning it hit the UK datacentres and we’re getting our first proper look at them. They look like being a massive change!

Google intend mixing into the main body of the search results the kind of results we’ve occasionally seen for local search in separate additional boxes at the top of pages. However these aren’t additional any more but form part of the normal 10 results (for anyone using the default setup). That’s not all, it wont just be local search; there may be book search, image search, video search, and others. Essentially Google has decide to include all the additional search features that have been sitting mostly unused on the menu bar above the search box into the main search results.

This change could have a profound impact on the usefulness of results, and depending on what you’re looking for it could be either much better or much worse. So far there is no sign of a way to avoid these changes in your preferences. Nor is there any indication of how often such results will appear.

For many companies who depend on first page/top ten results for traffic it could be a disaster if these new results push them off that vital first page. One example I’ve seen shows a search with local results where the three locals take up three places out of the ten. If you were in positions 8-10 the chances are you’ve been pushed down to the second page.

Only time will tell if this proves popular with users or not and what the ramifications will be for the SEO industry.
If users like it then it will cement Google’s position as the pre-eminent search engine and may consign the struggling MSN/Live Search to oblivion. However if users find their searches producing poor or irrelevant results then we might see a swing back towards anyone who can provide a good alternative. In the meantime I expect to see the conspiracy theorists out in force saying that it’s all a scheme to increase Adwords spending by the folk whose ranking get pushed down.

Absence of competition in the search market

Thursday, May 10th, 2007

A few short years ago there was still a fair amount of competition for your questions and searches. Even when Google had become the major player there was still a good percentage of the market who used Yahoo and MSN, and a reasonable number who used Alltheweb, Hotbot, Alta Vista, and a few others, while the major ISPs and portals such as AOL, Netscape, Blueyonder etc. also had search results that were somewhat independent even if they were based on one of the major players.

Now however we are seeing almost total domination by Google and a lot of people are getting worried. A recent analysis of search engine share showed that both Yahoo and MSN/Live Search are down in single figures on the percentages and the rest are nowhere. Google UK alone beats all except Yahoo and is not far behind them. Google Canada alone has almost as much share as MSN. Google Germany alone beats Ask.

Meanwhile, reading the webmaster and seo forums shows a lot of people who are totally dependent on Google for their traffic and therefore their business survival and there is no longer an alternative strategy for them. They are worried, and with good reason. Even discounting the ones who proclaim innocense of unethical methods but are shown to be riddled with them when you look at their sites, there are still many who find themselves suddenly dropping out of sight when Google make a change to their algorithm. Indeed it happened to me when the last really major change took place about a year and a half ago - some terms that I had been clearly the most relevant site for dropped completely out of the results, yet once the BigDaddy update was resolved they suddenly reappeared as high as before and are still there. If I had been dependent on them for my income then their loss for 3 months could have had serious consequences. Yet throughout this time there was stony silence from Google other than the general platitudes about creating good content and many webmasters and small businesses will have spent a lot of wasted time making unnecessary changes to their sites in the vain hope of making a difference.

Such a situation makes people jittery and it hasn’t been helped at all by the recent utterances about paid links and suggestions from Google that you should snitch on your competitors. If this had been confined to the more blatant attempts to manipulate PageRank then that would have been one thing, but in fact they appeared to be declaring that all paid links are bad unless they carry nofollow tags, and that has really got a lot of backs up.

Suddenly Google, the company that everyone liked because of their simple, clean and advert-free interface, and their “do no evil” ethos that harked back to a more innocent and somewhat hippy-inspired internet, is being cast in the role of villain and their monopoly seen as a threat.

In fact the unthinkable has happened - Microsoft, the company that everyone loved to hate for their own monopolistic practices, is being seen as a last hope for competition and is being urged to follow through on the rumoured merger with Yahoo.

Google need to be very careful if they are to retain/regain their previous blissful public relations position. Trotting out cuddly Matt Cutts and cute Vanessa may not be enough any more. With worry about the ever increasing amount of private data being collected people naturally think about Big Brother (and I don’t mean the attrocious reality TV programme). They might just start thinking about George Orwell’s other well known line - “power corrupts, absolute power corrupts absolutely”. If that happens then instead of us worrying about how much Google trusts our websites, Google might have to start worrying about how much we trust them!

Keeping out the spiders

Thursday, May 3rd, 2007

I spend most of my time telling people how to best allow spiders into a website, however it can sometimes be just as important knowing how to keep them out.

Why would you want to want your content kept out of the search results? There could be a number of reasons:

  • Unwanted duplication - if you provide printer-friendly versions of some pages for people who need a hard copy of your content without extraneous images or navigation buttons, then you dont want those spidered as they would appear as duplicate content. The search engines don’t like showing their customers two versions of the same material so make it easy for them to tell which one they should use by excluding the printer version.
  • If you keep any sensitive data on your site such as wholesale prices for trade customers then you want to make sure that isn’t made public.
  • If you have any large images or large numbers of moderately sized images then you may wish to avoid high bandwidth usage or high server loads by stopping search engines from indexing these images.
  • If you find bandwidth is being taken up by spiders which are no good to you - link checking spiders, or academic plagiarism spiders for instance - then you may want to keep them out altogether.
  • If you’re troubled by scraper spiders that simply come to steal your content.

The remedies depend on the situation and your site structure. An individual page can be kept out of the indexes using a simple robots meta-tag set to noindex (the major search spiders obey this but more specialist ones may not). Larger sets of files in directories can be isolated using the robots.txt file and specific spiders excluded from part or all of a site by the same method.

Rogue spiders can be more of a problem since by their very nature they usually don’t abide by robots.txt instructions, so you may need to detect them in your log files, identify their ip addresses, and then ban then in your server settings. This is a very specialist area and we recommend that you research it thoroughly before embarking on action. It’s all too easy to ban a wider range of people than intended and it’s also easy to spend an enormous amount of time chasing down rogues who then change ip addresses and reappear.