How to Fix Crawled Currently Not Indexed Issues
An often overlooked issue within SEO but one that should be taken seriously. Lots of NON INDEXED content can spark wider content quality issues / site architectural issues.
Posted by Daniel Foley Carter
Make Crawled Currently Not Indexed & Discovered Currently Not Indexed Your Enemy!
Google’s crawled your website, it’s been digging through every page, every HREF, every link it finds it’ll crawl. Google’s indexing your content, your indexed pages count grows higher, but wait – what’s this?
It’s Crawled Currently Not Indexed
For many – an ubiquitous issue – however, not one that seems to be at the top of many priority lists within SEO – when the reality is – it should be. You see – your INDEX is an overall representation of your website – so, if thousands of your URLS are being crawled and left out of the index – it’s because Google isn’t seeing value in them.
Now – there are various nuances and situations that may lead to this issue – but the general consensus is if Google ain’t indexing it – it’s likely not worth indexing (sometimes).
Google has a habbit of “NOT INDEXING” indexable content that is of end user value – but, this is where the problems begin.
Crawled Currently Not Indexed: The Biggest Issues
- Volume dependent – it can spark wider distrust in the domain thus impacting indexed content – if you have a disproportionate amount of content not being indexed – it tends to speak volumes about your overall website
- There could be ACTIVE URLS responsible for URLS not being indexed i.e. Excessive similarity where a perceived lack of value leads to a URL not being indexed (excess common content for example)
- Trust in overall content value can be impeded (based on numerous SEO tests where the only change was to addressing crawled/discovered currently not indexed issues) Measurement indicators were overall page query counts, crawl and cache frequency
Suffered a Manual Action?
If you’ve suffered from a manual action from Google and require help to have your index / website and rankings restored – here at SEO Audits IO we specialise in auditing and rank / index recovery, manual action removal and more.
Why is My Content Crawled But Not Indexed?
Googlebot on first pass indexing of content will determine whether content needs to be indexed. This is because Google’s index is disproportionate to what it needs to be – if content isn’t up to scratch and is unlikely to be served why bother indexing it?
Generally – your content isn’t being indexed because of 1 or more reasons that we’ll cover in each independent section below to make it easier.
Thin Low Value Content
If your content is deemed to be thin/low value it’s far less likely to be indexed. The context of what your content IS and what it is offering will ultimately determine “value” – for example, if you had a page about a Lemon Drizzle Cake recipe and you put a list of ingredients and very short, non descriptive steps where competitors offer something of FAR MORE value – the probability of your content being indexed goes down.
The depth, value, coverage and “perceived” experience will all dictate whether your content is indexed or not – also note this is not a FIXED process -other domain factors will ultimately impact on perceived value.
Poor Internal Linking / Sitemap Discovery
Content that’s not easily reachable via a crawl is likely to be perceived the same way for end users who are using your site. Ultimately, content that has a very low profile in a website is naturally less likely to be seen as something of value.
If Google is aware of content but had to discover it via sitemap where there were no perceived referring pages – it’s deemed to be orphaned in that a user wouldn’t find it if they were navigating the website – this further devalues content – couple this with content that’s thin / low value = trigger for being crawled but not indexed.
This is why it’s CRUCIAL to make sure your website architecture is supportive of good internal linking policies.
TIP! using Google Search Console’s URL inspection tool – you can see if there are any recognised internal referring pages.
Rendering & Accessibility Issues
If content isn’t available on Googlebot crawling i.e. dynamic JS content injection / complex SPA’s or issues with SSR (server side rendering), reactJS issues etc. then Googlebot may not see the content that end users do – render output is key, if the content isn’t parseable because it’s not served to Googlebot then the page will appear empty.
This too can lead to pages being crawled not indexed.
Accessibility of resource can also impede page performance – blocked JS/CSS can disrupt how a page is perceived to render – this can lead to mobile usability issues, malformed caching and can also bring a page closer to being crawled but not indexed.
Video Transcription:
00:01 Hi, my name is Daniel Foley Carter from SEO-audits.io and today I’m going to show you a really nifty trick to get a better understanding of pages that are crawled currently not indexed and discovered currently not indexed.
00:17 So, for those of you that don’t know having content that is crawled and not indexed is generally an indicator that that content is a lack of value.
00:27 Typically, if Google is crawling something crawling different pages on your site, be it a service pages, product pages, articles, or malformed URLs or parameter-driven URLs, sometimes what Google can do is interpret the content at first-class crawl as not being a value or unlikely to be served.
00:56 So what I’m going to do today is just show you a quick tip on how you can export that data, better filter it, better run, understand what stuff that you might need to action and stuff that you can ignore.
01:10 Now the other thing that’s really important to understand is that the data that Google holds on your site isn’t always up to date.
01:17 So we know that Google bot is very slow, generally if there are issues, within pages that are not indexed, you’ll often find that when you export the sample data, it generally will contain stuff that is no longer valid.
01:31 So what I’m going to do now, I’m going to show you the process. I did put this out on a social media post.
01:37 A lot of people saying that it was it looked fairly complex. So I’m going to demonstrate how to do it now in the video so that you can easily repeat this for yourself.
01:49 So the first thing that we’re going to do is we’re going to go under indexing and we’re going to go to pages and what we’re going to look at today is crawled currently not indexed and and discovered currently not indexed.
02:02 But for the purpose of the video, we’re just going to do crawled currently not indexed because it’s the same process we’ve discovered.
02:10 So we’re going to go in and we can see that the trend is slightly, slightly on the upward trajectory. So again, you know, immediately I can already see that these URLs are feed URLs.
02:23 So it’s looking like a majority of these are going to be feed-based. We do see that there are some that aren’t feed-based.
02:30 So how well this will work on this particular site. I don’t know, but you can repeat this on yours irrespective of the data.
02:37 So the first thing that we’re going to do is export to a Google She. So this is for bulk code, which is one of my test domains.
02:50 So we’ve exported our data into here. Now what we’re going to need to do is we’re going to need to get a crawl of our site.
02:57 Now. Now various different crawl tools will have various different outputs after a crawl. Typically any tool that will output with internal link counts, word counts, HTTP status, anything like that is.
03:15 Generally going to be good enough. So what we’re going to do is we’re going to now crawl our domain. And click on bulk code.co.uk We’re just going to crawl that.
03:35 Okay. So what we’re going to do is let that crawl. And we’re going to go to our table. And we have our list of URLs here.
03:53 Now the first thing that we’re going to want to do is we’ll just freeze. Our top row. And. We’re not really too bothered about when this was last caused.
04:05 We’re going to delete that. Now. Before we go any further, when we use Google Apps Scripts, you need to make sure that the Chrome profile that you’re signed into is.
04:19 The same as the account where you’re actually signed in on your Google account. Otherwise you may have problems running the script.
04:29 Now for the purpose of saving time, what I’m going to do is. It’s just show you a sheet that’s already got the script populated.
04:42 Okay. So effectively imagine that this is still your sheet. What you’re going to do is you’re going to go into extensions app scripts.
04:55 And wherever this video is posted, I would have done a post and I would have put an app script called HTTP status code, which you’ll see here.
05:06 So what you’re going to. To do is you’re going to come in here and you’re going to call this HTTP status and you’re just pasted this.
05:13 Don’t worry about any of this stuff down here. You’re going to paste this. Okay, and then you’re going to save it so that it’s there ready to execute.
05:24 Then what we’re going to do pretending that this is all happening in one sheet is with your URLs, you’re then going to use equals HTTP status code and then you’re going to select.
05:39 The adjacent URL. And then what you’re going to do is you’re going to just drag this all the way down to let that run.
05:56 Okay, so all we’re looking to do here is just get our HTTP status code. So we need to let that run.
06:09 And it can take a minute and we’ll just check on our crawl. So our crawl has finished. While we’re waiting for HTTP status to run, we’re just going to export our crawl.
06:30 I do apologise. Before you export, make sure you select internal HTML and then export. Just so that you don’t end up with loads of etc.
06:40 URLs. So internal HTML and then export it. Then what we’re going to do is we’ll open it. Okay. So we’ve opened our crawl in CSV and we’re simply going to select.
07:17 Control an A to copy it all. And then in another tab, we’re going to call this full site crawl. And we’re going to paste all of the crawl.
07:32 And we’re going to go into here. Okay. Now, when you do your export from screaming frog, it’s very likely that the column orders will be different to mine.
07:45 And that’s because in screaming frog. In screaming frog, you can actually drag the column across to wherever you want. So however you So effectively we’ve got our crawl here and what we’re going to use is VLOOKUP so that we can match our URLs against any of the pre-existing crawl data.
08:18 So if we now go back, To our table and we’ll just call this URLs for the sake of it. Right, so our HTTP status codes have now finished running.
08:33 So I’m just going to select the column, Control and C. Now again, you would just literally run the script in here rather than doing what I’m doing while I’m having to run it in a separate one.
08:46 And that’s only because Google Apps script doesn’t seem to work on my account very well. So I’m using a different sheet on a different account.
08:53 But effectively you would run this in the same sheet that you’re working on so you wouldn’t have to copy and paste this in.
08:59 Okay. So we now know that the HTTP status of the URLs that will, looking at. And this is important because generally you’ll find that Google can list URLs that are HTTP 404 401.
09:19 You can see lots of different status codes generally pop up. So the next thing that we want to understand, is are these, are these pages actually linked internally?
09:31 So what we’re going to do, we’re just going to create a column called internal links. Now what needs to happen?
09:39 And I’m going to explain the V-l- cup. So I’m just going to open a V-l- cup. Okay. So what this is going to do is it’s going to use V-l-e-cup.
09:55 So it’s going to look for cell A2, which is, the URL. It’s going to look in the sheet full-cycle, which is this sheet, which is why we’ve named it that.
10:06 It’s going to look across the sheet range from A1 to ZZ. This can be any value as long as this value is, larger than the volume of URLs in your crawl.
10:16 So that’s important to know. If you’ve got 20,000 URLs, then you’ll want to make sure that that’s set to 20,000.
10:23 If you’ve got 50,000 URLs, 50,000 and vice versa. And the number here, The number here is indicative of the column number where that is found.
10:36 Okay? So if you look here, it says column number 37. Okay? Now, If you look, Looking for internal links. So we only need to set this number to match whatever the column number is of where our internal links are.
10:55 Now, in the social post that I put this in what I did was tell you to drag the columns across, but, Technically, if you know how to look up your column, which is very easy.
11:06 So I haven’t done the drag across on here yet, but we’re gonna look for unique in-links, which is this column.
11:13 Okay? Now to find the column number, all you simply do is just drag all the way across the top. And you’ll see in the bottom corner that it says it’s the 38th column across.
11:25 So you can either drag the column, you can either drag the column across, or you can just update the reference.
11:31 So in this instance, I’m just gonna update the reference to column 38. And if there is a match, it will pull the volume of internal links.
11:41 Okay, so we’re gonna then double click that, centralize it. I can see here that one of the URLs has nine internal links.
11:51 Then we’ll, What we’re going to do is we’re going to look at how much content is on these URLs. Is there actually any text content?
11:59 So it’s gonna be the same thing. It’s gonna be VLOOKUP. What column number is it? So this one is column number, two.
12:09 So I’m gonna go to here and I’m gonna select column number two. And again, I’m gonna drop this all the way down.
12:20 Now generally, if there’s no internal links, it’s very light. I believe that there’s no content, but we can see here where there is internal links, there is content.
12:28 And I did say that this was likely not a great site to use because a lot of the URLs are feed URLs, which shouldn’t be indexed anyway.
12:38 So the next thing that we’re gonna do, we’re gonna, I’m gonna look at so we’re gonna look at GSC clicks and impressions.
12:47 Now why might we do that if something is not indexed? Well, it may have been indexed and then dropped from the index.
12:53 So it’s always good to know whether or not, It has only 16 month index data. So we’re gonna go to performance.
13:01 We’re gonna change this to last 16 months. Then what we’re going to do is we’re going to export to a GSC.
13:16 Then what we’re going to do, we’re going to go down to pages and we’re gonna select all of that. We’re gonna copy it and we’re gonna go back to our sheet and then we’re just gonna call it, GSC, click data.
13:32 Okay, then what we’re gonna do, paste. Now the idea here is exactly the same thing. We’re gonna go back to our URLs.
13:44 Now, And what we’re going to do is we’re going to use an adjusted VLOOKUP. So in the adjusted VLOOKUP, you’ll see here, we’ve just changed the sheet name for where we’re gonna do the lookup.
13:55 And again, remember the range. So depending on how many URLs, if you were exporting from, search for, Oh, you will only get a thousand rows, but you can use something like search analytics for sheets to bypass the thousand URL limit.
14:08 So if we’re looking at clicks and impressions, we want the index numbers. Okay. So the index numbers for click. So we’ve already set that one and then we’re going to set this at number three and then we’re going to double tap on that to go down.
14:30 Now what we’re going to do is select. Select all the columns and go to set that. And if you’re pedantic like I am, you’re going to want to format and then wrap.
14:44 And as with anything, I always like styling things. So. Then what we’re going to do, we’re going to press control and a and we’re going to click data and we’re going to click create a filter.
14:58 Now what I typically like to do is have the original sheet that’s not been touched. So I will then. Duplicate the sheet and I will say your L’s.
15:10 HD to be 200. Okay, so in this HTTP 200 version, I’m going to click on the filter column and I’m just going to have 200 selected or you can.
15:22 So if there’s loads of these exceptions that come up, you can just select field by condition, text contains, and then you can just put in 200 and hit.
15:31 Okay. You’ll see that there are now no 301s. So this is the first step taking that. URL export, you might have 100 URLs, you might have 1000 URLs, but at least when you’re doing this, you’re filtering down to stuff that you know is active.
15:47 Now you can then apply filters sequentially. So anything that. Is is active, but doesn’t have internal links. You can then look at the URL.
16:01 Now we know that feed URLs are not something that we need to have indexed. I personally do not have any feed URLs indexed.
16:11 For the purpose of this site being a test site, I haven’t bothered to exclude them via robots text. But for the purpose of ignoring feed URLs because they just are not a value anyway, we’re going to exclude them.
16:22 So we’re going to do text does not contain. And we’re going to get rid of those. And you’ll see here we’re left with just a small volume of URLs.
16:36 So if something is active, okay, and it returns h. DTP 200 and there’s no internal links. There’s no word count and there’s no historic data.
16:49 Then we have to ask ourselves, well, if there’s no internal links, what is the page? Is it an anomaly? Is it a page generated?
16:59 As part of a WordPress plugin, is it a media URL? Is it something that it is orphaned because it’s not a structural part of the site?
17:09 So we can see here that this thing says featured, featured underscore item. So it’s. Likely that this is something to do with WooCommerce and a piece of code that refers to a URL that maybe does a pop up and Google’s managed to find an index that will crawl it at some point at least.
17:29 And you can see here we’ve got WP content. URL for a plug in. You can see we’ve got paginated results in here.
17:37 So typically this is a great way to find. So if we see here, we have a URL that was indexed.
17:45 We have numerous URLs in here that was. We’re indexed at some point and then they were dropped. Okay. So the idea is going to be if something doesn’t have any internal links, doesn’t have any internal data, then we would make sure that this is not something Google would come back.
18:03 And now even though Google is telling us it’s not indexed, ideally if we find stuff like this, we don’t want Google recrawling it or polling it.
18:13 So we would set in motion some form of index in control. Now I can see here that this is a.
18:20 Pagination item. Okay, so it’s very probable that some pagination is index and some pagination is not indexed. I would have full index pagination control to eradicate issues like this.
18:32 So typically I would know index follow a pagination series with. Which would bring this to the point where Google would have to obey the directive if it was to repole for it.
18:45 But if there are pages in here that are active and they have internal links and they have data and they’re no longer.
18:53 Or indexed, it’s very likely that Google just does not see any value in that page anymore. And that lack of value could be anything from there being another product with a lot of common content to Google just generally not seeing any value in that in that page.
19:08 So typically. It was something like that. I would go back. I would adjust the content. I would add a few extra extra external links to the page and then I would resubmit for either request indexing after live URL inspection.
19:22 This is a great way to drill down to. Find your URLs that are active, that may have been indexed and then obviously removed from the index versus stuff that might be anomalies.
19:33 Anyway, I hope this video helps. Have a great day. My name is Daniel Foly Carter You can find me on LinkedIn and I run SEO hyphen audit started.
19:41 They are where I run lots of SEO webinars and other lots of tips and tricks that you can follow if you need.
19:48 Have a nice day.
































