Solution for GSC Not Showing All URLs of a Sitemap XML in the Coverage Side

u/juzzle

Google Search Console – submitted 1000+ pages as SITEMAPSXML weeks ago, but Google Search Console only shows 23 URLs in the "Coverage" area – why?

I have a WordPress site with a few key sitemaps (generated by Yoast) – these were submitted to Google Search Console weeks ago (https://i.imgur.com/sKcpwwj.png). With weeks past but 1400 valid pages described by those XML files, I am concerned that the "Coverage" section of Google Search Console only shows 23 "valid URLs" (https://i.imgur.com/tsrEqLx.png) – most of those 23 urls are actually from an outdated version of the site (from which we switched a month ago, with 301 redirects accordingly).

What doesn't make sense to me is that most of the pages described in those XML sitemaps are discoverable using a search on Google and they seem to be in the Google Index (https://i.imgur.com/5CCIHvX.png). However, the number of click throughs on all these pages is much lower than expected – and yes, the content is well-structured content (Title, Excerpt/Description, H1, H2 & Body) .

Why does Google Search Console seem to only be aware of pages which are months old?
Is Google ignoring my sitemaps? If not, why aren't they shown in "Coverage"?
21 💬🗨

📰👈

solution for gsc not showing all urls of a sitemap xml in the coverage side
🔗🏹

questionacc
Are those the only site maps you have? Does the "resource" site map actually contain user focused pages that you wanted indexed?

You should submit all the site map you have. Even if you have a sitemap directory you should submit them all individually anyway (IMO).

Also FYI, doing a site search is not a super reliable way of knowing what's in the "index," it's only a way of knowing which URLs Google is aware of.

juzzle ✍️
Thanks. Yes, the Resource pages are all articles – it's excellent well-structured content.

Also FYI, doing a site search is not a super reliable way of knowing what's in the "index," it's only a way of knowing which URLs Google is aware of.

Yes, you're right, good point – however, I conducted audits by emulating users searching for specific strings, and I am convinced that Google is indexing the content.

I think the Coverage issue might have been related to the fact that my Google Search Console (GSC) property was setup as a URL Prefix type, as opposed to Domain type. Now that I have set up, after a few days, I will review the Coverage and hope to see something better.

emirhan
Could it be that URLs in the sitemap are https://example.com and your Google Search Console (GSC) account is https://www.example.com?

juzzle ✍️
It could be, but surely Google would look past that academic difference in <year>

mktgbill
Nope, especially if you're in the GSC account for a single version of the site and not in a domain level account.
juzzle ✍️
Thanks. Can you clarify the difference between a "single version" and "domain level account" please
emirhan
It definitely will not. If you own the domain, verify it via Domain Name System (DNS) records and create a domain-level GSC account.
juzzle ✍️
Thanks. To be clear, are you saying that Google would not index https://abc.com/sitemap.xml if the GSC base URL was https://www.abc.com? Or are you saying the the Coverage is GSC might be wrong with the same mismatch
emirhan
Coverage report of https://www.abc.com will not include data about https://abc.com
juzzle ✍️
Right you are – good tip. I've just created a new Domain type property for the site (rather than URL prefix). Now we wait.
BTW, I've definitely verified the domain via Domain Name System (DNS).

lawnguylanddude
Try creating a manual sitemap file with only 100 urls in it, choose the 100 most important URLs. Put the sitemap in the robots.txt file and manually submit the sitemap file. Monitor the URLs to see which ones are being indexed, once a page appears in Google remove it from the sitemap. When the manual file drops to about 50 URLs add the 50 next most important URLs. Resubmit the sitemap manually every 3-7 days.

This process doesn't scale for large sites, but it does expedite the crawling and indexing of new pages or pages that have changed recently.

I usually have two versions of these manual xml files, one contains the top 100 most important URLs from the site, the other contains the last 100 important pages with changes.

I have experimented with the number of URLs and found 100 or less was the size that got the quickest results.

I have also found that using both the Google speed test tool, and the structured data test tool was a good way to get a new, or recently changed page indexed.

juzzle ✍️
Brilliant information, thank you so much. After switching to a Domain type property Google is now indicating 65 or so Valid URLs, so it appears to be working through the site properly. That said, these fine tunes look definitely worth investigating.

lawnguylanddude
Another way to maintain regular crawling is to create a content page with links to as many of your high priority pages as possible. Then make sure this page is in your priority xml file and in your common footer. This forces a frequent "natural" recrawling of your key pages.
juzzle ✍️
With around 1400 significant articles, it's difficult to choose the priority pages, however I take your point as the idea of marking articles (pages) as "key" articles definitely has merit, especially given the weeks/months of crawling ahead. For now, our home page lists the latest Posts (News) and Resources (Articles) as they are added – to your point, partly – however, we are also adding new articles from 10, 20, 30 and 40 years ago (with published dates accordingly), and these obviously wont register on the front/home page.
lawnguylanddude
What are the 20 pages from your site you want to rank #1 for?

What pages bring you the most leads?

What pages bring you the most revenue?

What pages bring you the most Adsense revenue?

If you're going to tell me all 1400 pages are equally important, you're never going to be able to prioritize what pages you should work on next.
juzzle ✍️
The site represents a non-for-profit organisation. We're not about revenue and we can't afford AdSense. I simply want to ensure that, for the benefit of members, that relevant content is found, and for the interest of potentially new members, they see us as a purveyor of good quality information. I've set my priorities by types of sitemaps to submit to (monitor in) GSC.
lawnguylanddude
Honestly as an SEO you really should already grasp the concept of conversions, and what they are on your website. Every website has something they want people to do, buy a product, fill out a form, click an ad, subscribe to a newsletter, share a page on social media, make a donation, or some other activity. You want to know which pages are generating conversions and which pages aren't. You should also always be A/B testing your conversion messaging, colors, and other elements.

linkilo
Make sure to interlink your posts with each other. The more you do the better Google can crawl and index other posts.

23 URLs were discovered, and it will take time for Google to crawl all of them. Discoverability, crawlability is super important.

Even if you submit your sitemap or submit your URL, it won't get to it until they feel like it.

So the more you can do on your end, the better.

It might crawl one of your pages and there are no other links that they can crawl, it will move on.

Also link building can also help, when they crawl the external link to yours, then it will get discovered and indexed.

1400 pages created dynamically? might seem irrelevant?

You can build internal links on a massive scale and easier with Linkilo.co when we launch next month :)

juzzle ✍️
Thanks for the advice – much appreciated.

1400 pages created dynamically? might seem irrelevant

No, they are all great content – research papers related to the subject matter of the site.

📰👈



Speeding Up Indexing of Thousands of Pages

Marketing and SEO for the Intended Keyword Private Jet Charter

a Share – How I Created SEO Content or High Quality (HQ) Content

RELATED POSTs

Leave a Reply

Your email address will not be published. Required fields are marked *