6 content and technical SEO priorities for effective site crawls

Search engines discover your site by crawling its pages and indexing content accordingly, so you want to make sure all that you have optimised is discoverable.

If you’re not prioritising and aligning key content and technical SEO strategy, you run the risk of crawlers never finding your page – or giving up on trying to understand it, then choosing not to index it as a result.

To ensure your site crawls are as effective as possible, consider these 6 priorities and ensure you stay on top of them to keep your pages visible and indexed.

1. Optimise your page loading speed

As a general rule, pages should not take more than 2 seconds to load. It is commonly accepted that a user will not wait for longer than 3 seconds for a page to load – and the same applies for web spiders.

Just as a user will give up on visiting your page, web spiders will leave your site if it doesn’t load in this timeframe. This means the pages will not be crawled or indexed, and you are sabotaging your chances of appearing in search results.

There are tools out there to check your website speed, such as Screaming Frog and Google Search Console. Once you have identified any pages which are not loading in optimal timeframes, it’s time to take steps to fix it.

Fixes that could be applicable include:

Optimising image size and format
Avoid using multiple tracking softwares
Check for CMS updates
Optimise caching
Avoid redirects
Reduce HTTP requests
Minify JavaScript and CSS

Google have delivered some guidance on the main areas to focus on through their Core Web Vitals report in Search Console.

2. Update your robots.txt files

Your robots.txt file will tell search engine crawlers how you would like your site to be crawled.

The primary benefit of robots.txt is the ability to block individual pages or directory paths which don’t need to be crawled at all, for example shopping carts, wish lists, date archives when content is accessible primarily through categories.

That being said, a robots.txt file can also have negative impacts for crawlers when misused or coded erroneously, so it’s worth reviewing it or having a professional do so if you are not sure. Issues to check for include:

Root directory blocked
Wildcards used in the wrong places causing erroneous matches on disallow rules
File contains invalid syntax like the “no index” instruction which can confuse Search Engines
Access to JavaScript and CSS is blocked
Sitemap signposts a sitemap on another domain, or development site

3. Check for erroneous canonical tags

While canonical tags are helpful for instructing Google on which versions of duplicate pages you want it to index, issues can arise when incorrect canonical tags are used on your site.

For example, if you have a canonical tag pointing to a page which no longer exists, search engines are likely to ignore all versions sharing the incorrect canonical.

You can use the URL inspection tool in Search Console to find out if any other erroneous canonical tags are in place, so they can be removed and allocated to the correct pages.

Canonical tags also need to be implemented correctly for sites targeting international traffic. If users are being directed to different canonical pages based on their country and language, you will need to have canonical tags for each language to ensure the pages are indexed in each of the languages used by your site.

Hreflang annotations should also be used to indicate the relationship between different language versions of a webpage. Here is an example for a multilingual website:

You will most likely be using canonical tags and hreflang annotations together, so ensure that they are up to date for effective crawling.

4. Improve your internal linking

Internal linking strategy is an important element for SEO in general – and it should also be a priority for ensuring effective site crawling.

While site structure provides a top-level overview of where content sits, internal linking within body copy provides additional insight for site crawlers to identify content which may not have been visible otherwise.

Improving your site architecture and implementing good breadcrumb strategy is also important, but by implementing targeted internal linking you are reducing the risk of orphaned pages as well as the sitemap being the only resource for search engines to find less visible pages.

5. Check for thin or duplicate content

It can be easy for pages with thin or duplicate content to be forgotten about, but when your site is crawled and these poor-quality pages are discovered, you may find that search engines conclude your content is not as valuable to searchers – so less time will be spent exploring your site and indexing pages.

Thin content could refer to copy which is minimal, or it could be poorly written, with grammar mistakes and spelling errors. It could also be content which does not add significant value for readers.

To identify the pages which need improved content, the first step is to determine which ones are not being indexed. You can then review the keywords the content has been optimised for and assess whether the pages are providing high quality answers for searchers.

Duplicate content can be an issue that is flagged in Google Search Console. You should receive an alert that Google is crawling more URLs than expected, which is what happens when the search engine is not sure which version of the content should be indexed.

If you have not received an alert in Search Console, you can check crawl results for issues such as missing tags, duplicate errors and long URLs which could be making crawlers work harder than they have to.

6. Address redirect issues

There are several redirect issues which can result in crawlers giving up on indexing your site – sabotaging your SEO efforts in the process. Here are some of the most common:

Redirect loops

If you redirect pages from page 1, to page 2, to page 3, then back to page 1, search engines will halt the loop and the page will not be indexed by the confused crawler.

Redirect chains

When page URLs are changed multiple times, sites see a decrease in load times, an increase on server strain, and dilution of link equity. Keep track of your redirects and make sure that no chains are in place.

302 redirects

By telling search engines that the page has only temporarily moved, ithe original URL will remain indexed and the new one will not receive the original’s link equity. If your redirect should be permanent, implement a 301.

Need our help?

We can assist you with troubleshooting site crawling issues you may be encountering and set you up with strategies to ensure your pages will consistently be indexed by search engines.

To find out more about your site’s current position in the search market for your industry, why not book in for a Free Acquisitions Workshop? We will help you uncover the hidden weaknesses in your current SEO strategy and get you back on track to achieving your goals.

AUTHOR

Sophie Brodie

Senior Content Executive

Sophie Brodie is the Senior Content Executive at Skittle Digital. Sophie has worked in digital marketing since 2019. She has produced content for hardback books and magazines for design et al, followed by managing the content and SEO for multiple brands at Sykes Holiday Cottages. She enjoys creating a variety of content across different mediums and working with new clients to achieve their content goals. In her spare time, you will find Sophie listening to podcasts, at a music gig, or absorbed in a book.

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
AWSELB	session	This cookie is associated with Amazon Web Services and is used for managing sticky sessions across production servers.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
ppwp_wp_session	30 minutes	No description
time_zone	session	No description available.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
webinargeek	session	No description

Cookie	Duration	Description
kameleoonVisitorCode	1 year 14 days	This cookie is set by the provider Kameleoon. This cookie is used for storing a visitor code which helps in full stack experiment.
optimizelyDomainTestCookie	5 months 27 days	No description
optimizelyEndUserId	5 months 27 days	set by the Optimizely website optimization platform. This cookie is used to store a unique identifier which is a combination of an identifier and a random number. The purpose of the cookie is to track information on a per user basis. This is to allow the user to be properly identified and prevent duplicated data.
optimizelyRumLB	session	No description available.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.
_gat_UA-173349264-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.

Cookie	Duration	Description
_hjAbsoluteSessionInProgress	30 minutes	No description available.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	2 minutes	No description available.
_hjTLDTest	session	No description available.
mautic_device_id	1 year	This cookie is set by the provider Mautic.This cookie is used for identifying visitor across visits and devices. Mautic cookies are used for supporting marketing activities.
mautic_referer_id	30 minutes	This cookie is set by the provider Mautic. This cookie is used for marketing purposes. It heps in tracking people submitting forms.
mtc_id	session	This cookie is set by the provider Mautic.This cookie is used for setting unique ID for visitor, to track visitor across multiple websites inorder to serve them with relevant advertisements. Mautic cookies are used for supporting marketing activities.
uid	1 year	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.

Services