When working with an enterprise-sized website, managing your crawl budget can be the difference between new and existing content being propelled up the rankings, and huge sections of your site dropping like a lead balloon.
There are many ways to improve the utilisation of your crawl budget.
Here, we’ve assembled 7 quick tips that you can apply today. These are:
- Define robots.txt rules
- Break redirect chains
- Fix broken links
- Manage dynamic content
- Set sitemap priorities
- Improve caching
- Review thin content
1. Define crawl paths with robots.txt rules
Use your robots.txt file to block URL paths that Google is following when it doesn’t need to.
For instance, you might have seen in your log file analysis that Google is hitting hundreds of internal site search pages. These don’t need to be crawled because all your content is discoverable elsewhere. Therefore, blocking all URLs with the search parameter makes sense.
2. Identify & break redirect chains
Find multiple redirect status codes (3XX) that chain together before finally resolving. This is where page A redirects to page B, which redirects to page C and so on. Every redirect is counting against your crawl budget, so bypass all the redirects between the start URL and the final requested URL to make a saving and allow Google to use that crawl budget elsewhere. Then update all internal links to avoid triggering those redirects in the first place.
3. Fix broken internal links
Pick out those client errors (4XX) in the same way as redirects. Here, we want to put redirects in place to capture any benefits of inbound links pointing to those URLs, as well as repair the internal links that trigger them.
4. Manage your dynamic URLs
If you’re noticing lots of dynamic URLs (those using parameters) in your log files, then you need to decide if these URLs are necessary or not. This is most often caused by internal search pages (as covered above) or through filtering systems that narrow or sort page content. Whilst you can handle any possible duplicate content issues for Google using the canonical tag or ensure they don’t reach the index using the meta robots tag, it’s not always the best trade off if your crawl budgets are still compromised as a result.
5. Set your sitemap priorities
XML sitemaps allow you to set different crawl priorities and update frequencies on a URL-by-URL basis. Use this to send a strong signal to Google as to what you need crawled most often. Normally, this is homepage and top-level product / news categories. De-prioritise pages within sections that don’t update often.
6. Improve your caching
As we’ve said, every request Google makes to your server eats up crawl budget. This means that pages that rely on lots of additional resources to load can take up more of your budget than is necessary. Identify which resources are shared across multiple URLs, then make sure that Google can cache these and only request them once.
7. Review thin content
Sometimes the number of pages we publish gets out of hand, especially with large enterprise sites. Going through your content pages and identifying those where content might be combined into a single page helps reduce the number of crawl requests required for search engines to get to that content. A bonus in doing so: you’re also reducing the amount of competition in the index for the terms those pages target.
Need some guidance?
If you’re not sure where to start in optimising crawl budget, get in touch with us today. Skittle Digital offers a limited number of free Free Acquisitions Workshops, which will uncover the strengths, weaknesses and opportunities for your website – including recommendations for crawl budget handling and actional insights.