How to Manage Crawl Budget in 2024 by FSE Digital

While building a mobile-friendly site with great content is a vital first step in optimising your business website, there are some important technical considerations you must pay special attention to. Technical SEO remains one of the most complex aspects of SEO, largely because it remains one of the few tasks that only skilled web developers can carry out.

Content creation and link building can be carried out by trained writers and marketing executives, but technical SEO requires somebody that knows their way around a range of CMSes, understands web languages, and is confident with configuring web servers. Luckily, our web developer has a sound understanding of all these areas and we are able to enhance any SEO campaign with a technical SEO audit.

What Is Crawl Budget?

To index your web pages, Google first has to “crawl” them. It does this with search bots, also known as web spiders (thus, the term “crawl”) which follow links on the web. Every time Googlebot, the name of Google’s search spider software, comes to your website, it will start following the internal links on your pages to analyse and process all your content with the aim of updating the search index.

Google performs several types of crawl, but to keep things simple, we only need to discuss its deep crawl here. When Google performs a deep crawl on a website it is aiming to analyse every page to seek new information. If you update your page titles, and site content, add new pages, change images and photos, or remove pages, Google will detect this and update the index. On the other hand, with a shallow crawl, Google will only look at your newest pages, and maybe also your home page, to see what changes have been made.

For instance, Google may visit your blog every day, or if you are running a very popular site, Google will visit every time a new article is published to your RSS feed or sitemap. These crawls are fast and take up very few resources. Deep crawls, on the other hand, are very resource-hungry, and although we do not have the exact details of how Googlebot determines how many pages to crawl, we have good evidence that it will only set a limited crawl budget for each deep crawl.

What is Crawl Budget — Source: Google Search Console

Running Out Of Crawl Budget

So, if Google visits a website to perform a deep crawl, it may be allocated a set time, or a set number of internal links to follow, before it abandons the crawl. Why would it do this? Google can never know for sure that it has captured all of your websites – in theory, you could have billions of pages on your website, each linking to the next. If Google set itself the task of thoroughly crawling your entire site, it could, in theory, waste its limited resources on poorly ranked websites. So, it sets crawl budgets.

Vast sites such as Wikipedia will be allocated much larger crawl budgets than business websites and personal blogs. You may think that a small website will not exhaust its crawl budget, but unfortunately, badly developed small sites can!

How To Optimise Your Crawl Budget

Modern CMSes have many files which Google can access via your web source code. Although readers cannot see these files, they are used to build your site. Files such as CSS and Javascript, as well as individual image files and attachments all complicate your site and eat into your crawl budget.

Updating your server configuration with Robots.txt directives and page meta tags can help block Googlebot from some pages. However, Google does advise webmasters to be careful what is blocked, as if Google cannot properly view your CSS files it cannot determine if your site is web-friendly, and this can actually worsen your rankings.

Fixing 404 Pages (Page Not Found)

If your website has many 404 pages, caused by broken links on your site, it will more than likely be affecting your crawl budget as search engine crawlers are effectively spending time going to redundant, broken, or dead pages. To find pages returning a 404 error, you can go to Google Search Console > Pages, then you should see 404’s listed under ‘Not Found (404)’. You should then be able to export these pages and manually review each page to see if it is still recording as a 404, then make an action plan of pages you would like to redirect them to. We would recommend adding 301s (permanent redirect), as this will pass through some of the history/authority built up on that page to the destination page you are redirecting to.

Crawl Errors from Google Search Console — Source: Google Search Console

Control Dynamically Generated URLs

One of the biggest problems is with dynamic URLs. Dynamic URL’s are pages that we don’t want indexed in Google and are duplicated of other pages. For example, if we had this URL: example.com/trainers, a dynamic page of this may be something like this: example.com/trainers?sort=priceAsc, which is a sort filter of ‘high-low. Now, this page is a duplicate and will cause issues if not managed in the right way.

If your site contains hundreds of dynamically generated pages that are all identical, Google will soon use up its crawl budget, which means your most important content may never be seen.

Keep Your XML Sitemap Up-To-Date

If your site content has changed significantly, but you have not updated your sitemap, Google will keep returning to the pages listed and this will waste more of your budget. Ensure that your sitemap is up to date, and look into ways of generating sitemaps automatically with your most important pages and blog posts at the top. This makes sure that Google is constantly seeing your new pages and new content. Yoast SEO is a plugin that automatically updates your XML sitemap and adds pages that are set to ‘index’ into your XML sitemap.

Keep Your XML Sitemaps Up To Date — Source: FSE Digital

Build More External Links

One of the factors that Google uses to allocate crawl budget is external links. The more links that you have pointing to your website, the more important your website looks to Google and the greater the crawl budget it will allocate. Link building is not only important for building trust and improving your keyword positions, but it also encourages Google to return to your site more often and delve deeper when it does.

Make it easy for Google to find your content, and don’t let Google get lost in a virtual maze. Crawl optimisation is a bit like putting your best products at the front of your shop, ensuring there is no to out-of-date produce, and making sure that each shelf in your store provides something unique to be consumed. If you want to keep Googlebot happy, speak to our technical SEO consultants today.

Posted on 6 May, 2024