I really enjoy working on SEO for classified sites as even minor tweaks can go a long way to a step change in organic performance across a few hundred thousand, if not, millions of URLs.
Classified sites themselves however generally suffer from three common SEO issues:
- Inefficient bot crawl and indexation management through inconsistent technical architecture,
- Lack of a retirement plan for expired listings, and
- Duplicate content across listings as listings are usually also published elsewhere
In the particular instance of this case study, there were major crawl and indexation issues to resolve…leading ultimately to a huge black hole.
The Issue
Server log file analysis identified that bots, mostly Googlebot, were running wild creating several different permutations of category URLs where users were able to select two different categories for their a search.
Say for example we’re a clothing store. Shoppers are generally able to select to see clothes from within the /hats or the /shoes category… and this instance both. The CMS created the URL /hats-or-shoes which then displayed all listings from within those categories.
Whilst these combinations could be useful for users and you could argue there is some search volume in ‘hats and shoes’ keywords, the URL was left open for search engines to follow all internal links, creating all possible combinations and including them in the Google index.
Whilst hats and shoes may be useful …do we really want a URL crawled and indexed of /mens-red-shoes-or-womens-bras?
Probably not.
For this client, there were 40 different categories to choose from. That’s 780 different URL combinations.
780 isn’t that bad, however throw into the mix that this combination was available across 800 locations across Australia and it gets a little larger. E.g. /mens-red-shoes-or-womens-bras/in-melbourne
624,000 URLs. Eeek.
Oh, and just to add the cherry on top – how about we throw some pagination into the mix? An average 4 pages per category gives us 2,496,000 URLs.
The Impact
One of the first things we want to ensure when looking at server logs is that the pages we want to be crawled are, and those we don’t want to, or that offer little value, are not.
A 48 hour snapshot of the server log provided the following insight:
Yep, 0 requests for our canonical URLs. Essentially none of the intended URLs on the site were crawled – just permutations and other random URLs.
So that’s crawling. What about indexation?
Google Search Console’s indexation report highlighted the severity of the issue:
That doesn’t look like it’s slowing down to me.
The Solution
We want to:
- Remove the unwanted URLs from Googles index
- Prevent Googlebot from crawling and indexing the URLs
So that:
- We only have higher quality pages in Googles index that are actually visited
- Googlebot spends it’s time (aka ‘crawl budget’) discovering our actual pages that make us money
We do it by:
Managing which URLs are crawled and preventing the ones we don’t want to be indexed from being indexed.
- Adding a ‘noindex, follow’ robots meta tag to the ‘-or-‘ combination URLs and URLs within the paginated series. This instructs Google to not index these pages.
- Adding a ‘nofollow’ link attribute to category links once a category has been selected. This ensures Users are still able to select multiple categories but search engine crawlers cannot. We do the same to the links to the paginated results, E.g. ?pg=2 etc.
Importantly, you can’t generally add a nofollow link and a noindex instruction at the same time as the noindex instruction would struggle to become discoverable when the bots can’t follow links.
The options here are to either:
- Wait some time between actions 1 and 2, or,
- Submit a new XML sitemap including all the pages you want to be crawled (in this case the -or- URLs and paginated URLs) so the bots are able to discover and act on the ‘noindex’ robots meta tag.
The Outcome
Once the fixes were in place we can see Google gradually removing hundreds of thousands of URLs from the index.
Google Search Console Indexed URLs
Interestingly, now that Googlebot wasn’t spending time crawling all of these useless URLs, the site began to rank for a lot more keywords. In this case, 50% more.
ahrefs Organic Keyword count
And thanks to the increased quantity of keywords the site was ranking for, organic visibility doubled from 3,000 to 6,000.
SearchMetrics Organic Visibilty
In Summary
Classified sites are big, sometimes monstrous. The good news is that patterns can be found and minor fixes can impact millions of pages to improve ranking positions and/or increase the number of different keywords the site ranks for.
It’s imperative that the key category pages are crawled the most often in order to discover listings – especially new ones. In this instance we can see that by ensuring Google only crawl and index our intended content, we are rewarded handsomely.