Thin Content - how to find it, and what to do about it

SanityCheck has a Content Quality report. Find out why it's useful to find thin and low quality content, and then what to do about those pages on your site.

What is thin content?

At a basic level thin content is a page that doesn't contain have a high word count. Things have progressed since SEO's started worrying about thin content - and the term to consider now is low-quality content.

Back around 2009/10 sites started creating thousands of pages (even hundreds of thousands) where each one targetted an individual keyword to try and get clicks from search engines. These pages often only had 1 or 2 sentances on them, but were able to rank because of the back link profile and strength of the domain as a whole. As well as a sentance or two of content, these pages would be covered in ads - with them aim of them to get traffic from Google, and hope the visitors would click the ads and generate some revenue for the site.

This business model was what sites such as wikihow and mahalo.com were built upon, and the serps (Search Engine Results Pages) were littered with the results from thin content sites in 2010.

What is Google Panda?

Google Panda update was released by Google to combat these thin content sites. The first release of Panda algorithm was February 2011. Google viewed these sites and pages as low-quality as they didn't help the user answer the questions they were searching around, and the pages only existed to get search traffic and hopefully get users to click on their ads. By their very nature of not having much content on them, thin content pages also potentially caused duplicate content issues across a site.

The algorithm was rolled out in a single event and entire sites were hit. Overnight sites like mahalo.com went from hundreds of thousands of visits a day, to hardly anything and their business models collapsed overnight. While this accomplished Googles goal, a huge number of other sites were impacted by Panda - and without an explict message from Google to the website owners explaining why the traffic drop had occured, people were left scrambing around for answers and fixes.

Now Google say that pages are evaluated on an individual real time basis, rather than an algorithm update which happens every so often.

You can no longer rely on word count alone.

When Google Panda was first released, word count and percentage of words to HTML were shared around as the important metrics to consider when auditing a page or site. In 2019 searcher intent is often discussed when talking about page quality. Google expects different types of queries to present certain types of results - so if you were to sleep for 'womens slippers', Google assumes you want to buy a pair of womens slippers and the search results will reflect this with many product pages listed in the search results. Writing a 3,000 word essay on the history of 'womens slippers', no matter how good and thorough it is, will not get you anywhere near the front page of Google for this search term. A product page for women's slippers with a 250 word description will have more chance of ranking.

So check the search results for the query and keword you are targeting. Create content Google (and the user) is looking for, and make it the best content out there. After this - you need links!

Having said we can no longer rely just on word count, what can we do?

How to find out if your site has thin content?

With SanityCheck we use a simply hypothesis that if a page is deemed low-quality, it won't be getting any impressions in Google.

You can request the Content Quality report in SanityCheck and submit your sites sitemap.xml file. SanityCheck then processes each url in this sitemap and checks to see how many impresssions and clicks it has had from Google over the last 90 days.

If a url has had zero impressions over the last 90 days you can consider the page thin content or low-quality.

What to do with thin content pages?

Once you have found urls/pages you have a number of options:

  1. Improve the page
  2. Redirect the page
  3. Delete the page
  4. NoIndex the page

1. Improve the page

If this page is going targetting a target keyword - re-evaluate the content on this page and get working on improving it.

The first thing to check are the types of pages and content Google is currently serving up for the keyword you are targetting. Google maps intent to queries, so if you search for something and the results are a list of long form blog posts, and you are trying to rank a ecommerce product page - it's unlikely that page will be able to rank.

If however the content type is similar, and especially if it is article based - update and improve that article. Use tools such as www.clearscope.io to improve the text, and answer questions contain in the 'People also ask' sections of the search results.

2. Redirect the page

Before deciding upon simply deleting a page, check in a tool such as www.ahrefs.com whether this url has any backlinks pointing to it.

A more rudementary test, but free, is to check in Google Analytics to see if the page has ever had any referral traffic and where from. If it has check if those links still exist.

If not - delete the page.

If the page does have backlinks - consider redirecting the page to another relevant url.

3. Delete the page

If the url has been checked and it has no external backlinks pointing to it - you can consider deleting it.

This is often a good suggestion for old news type articles.

4. Set pages as NoIndex

You may want to keep the page on the site for reasons other than getting search traffic - for example for sending PPC and Social Traffic to. If this is the case you could mark the page as NoIndex so that Google will not include it in its search results index. This can often be the case for landing pages and lead capture pages that are very similar, except a change in headline.

If you do this - make sure the url is removed from sitemap.xml, and potentially not internally linked into the site in anyway.

Type of pages that cause low-quality and thin content issues

If your site has any of these particular types of pages, they could particularly be at risk of panda type penalties.

Landing pages or Doorway Pages

A type of page that can be considered low-quality or thin are doorway pages.

These are often landing pages where the page title and h1 are the only thing changed to try and target the specific keyword.

User Generated Content

If discussion forums or wiki based sites aren't well moderated they can become full of content added by users that is both thin and low quality. Discussion forums also by design create a lot of pages, such as profile pages, that do not have a lot of content on them. Be sure to check if these exist, and see if then can be set to noindex if possible.

Automatically generated pages

Your site may have potenitally take the approach of creating a page to target every keyword it has ever ranked for. This could be done by reviewing keywords the site and pages appear for in Search Console and through code automatically creating and generating these pages.

While Machine Learning and Natural Language Processing has dramatically improved over recent years - be careful with this approach as you may end up with lots of auto generated low quality spammy looking pages.

Affiliate Pages

Depending on the type of affiliate site you run - if you have lists of voucher codes or coupons you may come across a thin content penalty. This is a tricky one, as people who are searching for voucher codes simply want a list of the deals and codes they could use - so how can you improve the page? In this particular niche you may find that it is fine to leave coupon pages as they are, as long as the rest of the site has enough content to back it up.

Product Pages

A lot of product pages are very similar because they use the same template, and only have a unique h1 (the product name), description (often quite short) and product image. While you do not want to end up with product pages that have content that is not relevant - adding things such as a longer description, key product features and user reviews, can improve the content on a product page. For large ecommerce stores however this does present an issue as you may have hundreds of thousands of product pages you need to do this for.

Tag, Category and Author pages in Wordpress

Wordpress by default creates a lot of page types that do not particularly add value to a site - especially if tags and categories are used a lot by post authors. Typically a tag or category page will display the summay of each blog post it is assigned to. This can be troublesome if a tag or category on has a single post assigned to it, as there will only be a sentance or two displayed. This is more common than you might think! Tags and categories are often used in an adhoc manner without much thought.

Before you noindex all category, tag and author pages - you do need to check whether any of them are driving traffic and clicks to your site. A well used category page, with it's own description can be a valuable asset to a site which can rank really well. This is why it's important to check Google Analytics, Search Console or use the Content Quality report within SanityCheck before making blanket decisions.

Thin and low quality content summary

If your site has been around for a number of years, with content being added to it, it is worth investigating if Google judges any of these pages to be low-quality. You can do this manually by check each url in Search Console or Google Analytics, or use the Content Quality report in SanityCheck to speed up this whole process.