B2B NewsPet industry newsThe Fundamentals of Crawling for SEO – Whiteboard Friday

The Fundamentals of Crawling for SEO – Whiteboard Friday

-

- Advertisment -spot_img


The author’s views are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

In this week’s episode of Whiteboard Friday, host Jes Scholz digs into the foundations of search engine crawling. She’ll show you why no indexing issues doesn’t necessarily mean no issues at all, and how — when it comes to crawling — quality is more important than quantity.

infographic outlining the fundamentals of SEO crawling

Click on the whiteboard image above to open a high resolution version in a new tab!

Video Transcription

Good day, Moz fans, and welcome to another edition of Whiteboard Friday. My name is Jes Scholz, and today we’re going to be talking about all things crawling. What’s important to understand is that crawling is essential for every single website, because if your content is not being crawled, then you have no chance to get any real visibility within Google Search.

So when you really think about it, crawling is fundamental, and it’s all based on Googlebot’s somewhat fickle attentions. A lot of the time people say it’s really easy to understand if you have a crawling issue. You log in to Google Search Console, you go to the Exclusions Report, and you see do you have the status discovered, currently not indexed.

If you do, you have a crawling problem, and if you don’t, you don’t. To some extent, this is true, but it’s not quite that simple because what that’s telling you is if you have a crawling issue with your new content. But it’s not only about having your new content crawled. You also want to ensure that your content is crawled as it is significantly updated, and this is not something that you’re ever going to see within Google Search Console.

But say that you have refreshed an article or you’ve done a significant technical SEO update, you are only going to see the benefits of those optimizations after Google has crawled and processed the page. Or on the flip side, if you’ve done a big technical optimization and then it’s not been crawled and you’ve actually harmed your site, you’re not going to see the harm until Google crawls your site.

So, essentially, you can’t fail fast if Googlebot is crawling slow. So now we need to talk about measuring crawling in a really meaningful manner because, again, when you’re logging in to Google Search Console, you now go into the Crawl Stats Report. You see the total number of crawls.

I take big issue with anybody that says you need to maximize the amount of crawling, because the total number of crawls is absolutely nothing but a vanity metric. If I have 10 times the amount of crawling, that does not necessarily mean that I have 10 times more indexing of content that I care about.

All it correlates with is more weight on my server and that costs you more money. So it’s not about the amount of crawling. It’s about the quality of crawling. This is how we need to start measuring crawling because what we need to do is look at the time between when a piece of content is created or updated and how long it takes for Googlebot to go and crawl that piece of content.

The time difference between the creation or the update and that first Googlebot crawl, I call this the crawl efficacy. So measuring crawling efficacy should be relatively simple. You go to your database and you export the created at time or the updated time, and then you go into your log files and you get the next Googlebot crawl, and you calculate the time differential.

But let’s be real. Getting access to log files and databases is not really the easiest thing for a lot of us to do. So you can have a proxy. What you can do is you can go and look at the last modified date time from your XML sitemaps for the URLs that you care about from an SEO perspective, which is the only ones that should be in your XML sitemaps, and you can go and look at the last crawl time from the URL inspection API.

What I really like about the URL inspection API is if for the URLs that you’re actively querying, you can also then get the indexing status when it changes. So with that information, you can actually start calculating an indexing efficacy score as well.

So looking at when you’ve done that republishing or when you’ve done the first publication, how long does it take until Google then indexes that page? Because, really, crawling without corresponding indexing is not really valuable. So when we start looking at this and we’ve calculated real times, you might see it’s within minutes, it might be hours, it might be days, it might be weeks from when you create or update a URL to when Googlebot is crawling it.

If this is a long time period, what can we actually do about it? Well, search engines and their partners have been talking a lot in the last few years about how they’re helping us as SEOs to crawl the web more efficiently. After all, this is in their best interests. From a search engine point of view, when they crawl us more effectively, they get our valuable content faster and they’re able to show that to their audiences, the searchers.

It’s also something where they can have a nice story because crawling puts a lot of weight on us and our environment. It causes a lot of greenhouse gases. So by making more efficient crawling, they’re also actually helping the planet. This is another motivation why you should care about this as well. So they’ve spent a lot of effort in releasing APIs.

We’ve got two APIs. We’ve got the Google Indexing API and IndexNow. The Google Indexing API, Google said multiple times, “You can actually only use this if you have job posting or broadcast structured data on your website.” Many, many people have tested this, and many, many people have proved that to be false.

You can use the Google Indexing API to crawl any type of content. But this is where this idea of crawl budget and maximizing the amount of crawling proves itself to be problematic because although you can get these URLs crawled with the Google Indexing API, if they do not have that structured data on the pages, it has no impact on indexing.

So all of that crawling weight that you’re putting on the server and all of that time you invested to integrate with the Google Indexing API is wasted. That is SEO effort you could have put somewhere else. So long story short, Google Indexing API, job postings, live videos, very good.

Everything else, not worth your time. Good. Let’s move on to IndexNow. The biggest challenge with IndexNow is that Google doesn’t use this API. Obviously, they’ve got their own. So that doesn’t mean disregard it though.

Bing uses it, Yandex uses it, and a whole lot of SEO tools and CRMs and CDNs also utilize it. So, generally, if you’re in one of these platforms and you see, oh, there’s an indexing API, chances are that is going to be powered and going into IndexNow. The good thing about all of these integrations is it can be as simple as just toggling on a switch and you’re integrated.

This might seem very tempting, very exciting, nice, easy SEO win, but caution, for three reasons. The first reason is your target audience. If you just toggle on that switch, you’re going to be telling a search engine like Yandex, big Russian search engine, about all of your URLs.

Now, if your site is based in Russia, excellent thing to do. If your site is based somewhere else, maybe not a very good thing to do. You’re going to be paying for all of that Yandex bot crawling on your server and not really reaching your target audience. Our job as SEOs is not to maximize the amount of crawling and weight on the server.

Our job is to reach, engage, and convert our target audiences. So if your target audiences aren’t using Bing, they aren’t using Yandex, really consider if this is something that’s a good fit for your business. The second reason is implementation, particularly if you’re using a tool. You’re relying on that tool to have done a correct implementation with the indexing API.

So, for example, one of the CDNs that has done this integration does not send events when something has been created or updated or deleted. They rather send events every single time a URL is requested. What this means is that they’re pinging to the IndexNow API a whole lot of URLs which are specifically blocked by robots.txt.

Or maybe they’re pinging to the indexing API a whole bunch of URLs that are not SEO relevant, that you don’t want search engines to know about, and they can’t find through crawling links on your website, but all of a sudden, because you’ve just toggled it on, they now know these URLs exist, they’re going to go and index them, and that can start impacting things like your Domain Authority.

That’s going to be putting that unnecessary weight on your server. The last reason is does it actually improve efficacy, and this is something you must test for your own website if you feel that this is a good fit for your target audience. But from my own testing on my websites, what I learned is that when I toggle this on and when I measure the impact with KPIs that matter, crawl efficacy, indexing efficacy, it didn’t actually help me to crawl URLs which would not have been crawled and indexed naturally.

So while it does trigger crawling, that crawling would have happened at the same rate whether IndexNow triggered it or not. So all of that effort that goes into integrating that API or testing if it’s actually working the way that you want it to work with those tools, again, was a wasted opportunity cost. The last area where search engines will actually support us with crawling is in Google Search Console with manual submission.

This is actually one tool that is truly useful. It will trigger crawl generally within around an hour, and that crawl does positively impact influencing in most cases, not all, but most. But of course, there is a challenge, and the challenge when it comes to manual submission is you’re limited to 10 URLs within 24 hours.

Now, don’t disregard it just because of that reason. If you’ve got 10 very highly valuable URLs and you’re struggling to get those crawled, it’s definitely worthwhile going in and doing that submission. You can also write a simple script where you can just click one button and it’ll go and submit 10 URLs in that search console every single day for you.

But it does have its limitations. So, really, search engines are trying their best, but they’re not going to solve this issue for us. So we really have to help ourselves. What are three things that you can do which will truly have a meaningful impact on your crawl efficacy and your indexing efficacy?

The first area where you should be focusing your attention is on XML sitemaps, making sure they’re optimized. When I talk about optimized XML sitemaps, I’m talking about sitemaps which have a last modified date time, which updates as close as possible to the create or update time in the database. What a lot of your development teams will do naturally, because it makes sense for them, is to run this with a cron job, and they’ll run that cron once a day.

So maybe you republish your article at 8:00 a.m. and they run the cron job at 11:00 p.m., and so you’ve got all of that time in between where Google or other search engine bots don’t actually know you’ve updated that content because you haven’t told them with the XML sitemap. So getting that actual event and the reported event in the XML sitemaps close together is really, really important.

The second thing you can do is your internal links. So here I’m talking about all of your SEO-relevant internal links. Review your sitewide links. Have breadcrumbs on your mobile devices. It’s not just for desktop. Make sure your SEO-relevant filters are crawlable. Make sure you’ve got related content links to be building up those silos.

This is something that you have to go into your phone, turn your JavaScript off, and then make sure that you can actually navigate those links without that JavaScript, because if you can’t, Googlebot can’t on the first wave of indexing, and if Googlebot can’t on the first wave of indexing, that will negatively impact your indexing efficacy scores.

Then the last thing you want to do is reduce the number of parameters, particularly tracking parameters. Now, I very much understand that you need something like UTM tag parameters so you can see where your email traffic is coming from, you can see where your social traffic is coming from, you can see where your push notification traffic is coming from, but there is no reason that those tracking URLs need to be crawlable by Googlebot.

They’re actually going to harm you if Googlebot does crawl them, especially if you don’t have the right indexing directives on them. So the first thing you can do is just make them not crawlable. Instead of using a question mark to start your string of UTM parameters, use a hash. It still tracks perfectly in Google Analytics, but it’s not crawlable for Google or any other search engine.

If you want to geek out and keep learning more about crawling, please hit me up on Twitter. My handle is @jes_scholz. And I wish you a lovely rest of your day.

Video transcription by Speechpad.com



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

台湾参访团走进云南 深度体验“有一种叫云南的生活”

云南网讯(记者杨萍)为深化云台融合新发展,12月1日,台湾地区桃园市中国国民党党部参访团一行30人来到云南,在为期6天的时间里将走进云南昆明、大理、丽江等地进行参访交流。近年来,云南与台湾在文化、教育 Source link

中国3分钟|台湾之于中国,到底意味着什么?

今年是中美建交45周年。45年来,中美关系历经风雨,总体向前发展;未来,中美关系何去何从,不仅两国人民关心,国际社会也高度关注。历史昭示我们,中美合则两利、斗则俱伤。一个稳定、健康、可持续发展的中美关 Source link

台湾空气质量恶化,卢秀燕子弟兵轰:还对赖清德“健康..

台湾西半部近来空气质量亮橘灯,民众出门天空一片雾蒙蒙。台中市长卢秀燕子弟兵、国民党民代罗廷玮不禁痛批,今天台湾整个西半部都在“迷雾惊魂”,执政当局看到了吗? Source link

“研学热”:一场“旅游+教育”的神奇化学反应

东南网8月26日报道(本网记者郑琦/文)日前,福建省文化和旅游厅评选出福建中青国际旅行社有限公司、福建智行研学教育科技有限公司等23家“优秀研学旅游服务机构”。省文旅厅将通过发挥优秀机构的典型示范效应 Source link
- Advertisement -spot_imgspot_img

18个项目260亿!绍兴城市推介会走进港澳

“由衷期待双方以此次推介会为契机,持续深化经济、科技、教育、文化、体育等各领域沟通交流,努力打造更多标志性合作成果。”12日-14日,2024“港澳·绍兴周”在香港启幕,绍兴市委书记温暖率市代表团赴香 Source link

这种“教育”竟能做出撩动人心的美食?

30岁裸辞去蓝带学厨艺,毕业后仅仅用了2年的时间就一举成为百万粉丝的美食博主。这一次我们邀请到了美食博主徐人宇Vincent,以及蓝带国际大中华区董事总经理、澳大利亚蓝带学员商凌燕女士、蓝带巴黎学员侯 Source link

Must read

Jennifer Aniston’s Ex Justin Theroux Wishes Her Happy Birthday on Instagram

What was expected of her was the same thing...

Offset Shares a Video of Cardi B Giving Birth to Baby Kulture

What was expected of her was the same thing...
- Advertisement -spot_imgspot_img

You might also likeRELATED
Recommended to you


Warning: Invalid argument supplied for foreach() in /www/wwwroot/b2b/wp-content/themes/Newspaper-child/footer.php on line 40