HeroGraphic

6 content and technical SEO priorities for effective site crawls

Site Crawls

Search engines discover your site by crawling its pages and indexing content accordingly, so you want to make sure all that you have optimised is discoverable.  

If you’re not prioritising and aligning key content and technical SEO strategy, you run the risk of crawlers never finding your page – or giving up on trying to understand it, then choosing not to index it as a result. 

To ensure your site crawls are as effective as possible, consider these 6 priorities and ensure you stay on top of them to keep your pages visible and indexed. 

1. Optimise your page loading speed

As a general rule, pages should not take more than 2 seconds to load. It is commonly accepted that a user will not wait for longer than 3 seconds for a page to load – and the same applies for web spiders.  

Just as a user will give up on visiting your page, web spiders will leave your site if it doesn’t load in this timeframe. This means the pages will not be crawled or indexed, and you are sabotaging your chances of appearing in search results. 

There are tools out there to check your website speed, such as Screaming Frog and Google Search Console. Once you have identified any pages which are not loading in optimal timeframes, it’s time to take steps to fix it. 

Fixes that could be applicable include: 

  • Optimising image size and format 
  • Avoid using multiple tracking softwares 
  • Check for CMS updates 
  • Optimise caching 
  • Avoid redirects 
  • Reduce HTTP requests 
  • Minify JavaScript and CSS 

Google have delivered some guidance on the main areas to focus on through their Core Web Vitals report in Search Console.  

2. Update your robots.txt files 

Your robots.txt file will tell search engine crawlers how you would like your site to be crawled.  

The primary benefit of robots.txt is the ability to block individual pages or directory paths which don’t need to be crawled at all, for example shopping carts, wish lists, date archives when content is accessible primarily through categories.   

 That being said, a robots.txt file can also have negative impacts for crawlers when misused or coded erroneously, so it’s worth reviewing it or having a professional do so if you are not sure. Issues to check for include:

  • Root directory blocked 
  • Wildcards used in the wrong places causing erroneous matches on disallow rules 
  • File contains invalid syntax like the “no index” instruction which can confuse Search Engines 
  • Access to JavaScript and CSS is blocked 
  • Sitemap signposts a sitemap on another domain, or development site 

3. Check for erroneous canonical tags

While canonical tags are helpful for instructing Google on which versions of duplicate pages you want it to index, issues can arise when incorrect canonical tags are used on your site.

For example, if you have a canonical tag pointing to a page which no longer exists, search engines are likely to ignore all versions sharing the incorrect canonical.  

You can use the URL inspection tool in Search Console to find out if any other erroneous canonical tags are in place, so they can be removed and allocated to the correct pages.  

Canonical tags also need to be implemented correctly for sites targeting international traffic. If users are being directed to different canonical pages based on their country and language, you will need to have canonical tags for each language to ensure the pages are indexed in each of the languages used by your site. 

Hreflang annotations should also be used to indicate the relationship between different language versions of a webpage. Here is an example for a multilingual website:

<link rel=”alternate” href=”https://www.example.com/page” hreflang=”en” /> 

<link rel=”alternate” href=”https://fr.example.com/page” hreflang=”fr” /> 

<link rel=”alternate” href=”https://es.example.com/page” hreflang=”es” />

You will most likely be using canonical tags and hreflang annotations together, so ensure that they are up to date for effective crawling. 

4. Improve your internal linking

Internal linking strategy is an important element for SEO in general – and it should also be a priority for ensuring effective site crawling.  

While site structure provides a top-level overview of where content sits, internal linking within body copy provides additional insight for site crawlers to identify content which may not have been visible otherwise.  

Improving your site architecture and implementing good breadcrumb strategy is also important, but by implementing targeted internal linking you are reducing the risk of orphaned pages as well as the sitemap being the only resource for search engines to find less visible pages. 

5. Check for thin or duplicate content  

It can be easy for pages with thin or duplicate content to be forgotten about, but when your site is crawled and these poor-quality pages are discovered, you may find that search engines conclude your content is not as valuable to searchers – so less time will be spent exploring your site and indexing pages. 

Thin content could refer to copy which is minimal, or it could be poorly written, with grammar mistakes and spelling errors. It could also be content which does not add significant value for readers. 

To identify the pages which need improved content, the first step is to determine which ones are not being indexed. You can then review the keywords the content has been optimised for and assess whether the pages are providing high quality answers for searchers. 

Duplicate content can be an issue that is flagged in Google Search Console. You should receive an alert that Google is crawling more URLs than expected, which is what happens when the search engine is not sure which version of the content should be indexed.  

If you have not received an alert in Search Console, you can check crawl results for issues such as missing tags, duplicate errors and long URLs which could be making crawlers work harder than they have to.

6. Address redirect issues 

There are several redirect issues which can result in crawlers giving up on indexing your site – sabotaging your SEO efforts in the process. Here are some of the most common: 

Redirect loops

If you redirect pages from page 1, to page 2, to page 3, then back to page 1, search engines will halt the loop and the page will not be indexed by the confused crawler.  

Redirect chains

When page URLs are changed multiple times, sites see a decrease in load times, an increase on server strain, and dilution of link equity. Keep track of your redirects and make sure that no chains are in place.  

302 redirects

By telling search engines that the page has only temporarily moved, ithe original URL will remain indexed and the new one will not receive the original’s link equity. If your redirect should be permanent, implement a 301. 

Need our help? 

We can assist you with troubleshooting site crawling issues you may be encountering and set you up with strategies to ensure your pages will consistently be indexed by search engines.  

To find out more about your site’s current position in the search market for your industry, why not book in for a Free Acquisitions Workshop? We will help you uncover the hidden weaknesses in your current SEO strategy and get you back on track to achieving your goals.  

AUTHOR

Imogen Groome

Content Lead

Imogen is the SEO Content Lead at Skittle Digital. Imogen has worked in SEO since 2016. She helped to define SEO strategy at Metro.co.uk before guiding the newsroom at The Sun Online as SEO Editor. She has more than 5 years’ experience in scaling content strategies that drive revenue for brands through organic search channels. In her spare time, Imogen writes books, watches poor-quality reality TV and hangs out with her cats.

Similar posts