Block crawling via Robots.txt, Add NoIndex Meta Tags on each page, Password Protect the pages | What do they mean?

Kunjal
How to remove staging site from Search Engine Result Page (SERP) that's already indexed?
– Block crawling via Robots.txt
– Add NoIndex Meta Tags on each page
– Password Protect the pages & use server side authentication
What's the right step?

📰👈

4 👍🏽4 25 💬🗨

Truslow
Ultimately, the best way to get it out of the index is to make that staging URL toss 404's or redirect it to the corresponding pages on the live site.
NoIndex won't "really" remove it from the index – though it won't rank for anything unless someone is specifically looking for that page.
Blocking in Robots.txt can keep the page in there, but never updated.
Garrett
I've dealt with this issue multiple times with Devs forgetting to no-index the staging/production site. Hopefully you have your website in Google search console, if not I think you should still be able to add it in and do it after the fact if you have access.
This worked for me at least with a few sites I worked on. Just need to make sure you can no-index the site too so it doesn't keep happening. Let me know if this helps.
Use the Google search console removal tool. https://support.google.com/webmasters/answer/9689846?hl=en
block crawling via robots txt add noindex meta tags on each page password protect the pages what do they mean
🔗🏹

Woods
two – noindex all the pages and wait for Google to crawl the pages and recognise the no index. If all you do is block crawling, Google will never crawl the pages to find the noindex directives. You can also request removal in Google Search Console but then I'd also make sure the noindex is on all pages.
The robots.txt and password protection can help prevent the issue or prevent it happening again but won't clean them out of the index.
Also, depending on your set-up it might be easier to apply an x-robots noindex in the http header rather than adding a meta robots noindex on every page.
Mian
First Block Crawling via Robots.txt. So, they don't index again. Then, deindex pages from search console.

Miroslav Mišo Medurić
Search Console method lasts up to six months. robots.txt is NOT a method to make the removal permanent.
In combination with Search Console removal, it has to be either a 404, a password protection or possibly a noindex tag. Preferably one of the first two.
Mian » Miroslav Mišo Medurić
Best approach to de-index a URL is to instruct Search Engines blocking it through robots.txt and blocking through search console valids till 6 months, but there comes a question, why would a search engine want to index a URL which has been blocked already by the webmaster? I can be wrong in my perspective of Search Engines, correct me if I am.
Miroslav Mišo Medurić » Mian
There is no philosophy in this, or a need for perspectives. Googlebots are acting upon certain well documented rules.
On the sidenote, robots.txt has nothing to do with Google's index. It is an instruction for bots, and the two are not the same. For example, Google can obtain information about pages in other ways, besides crawling.
Your profile says "SEO expert" (?) Surely you have seen indexed pages with a meta description that says "no information is available for this page".
So, blocked, but indexed after the fact. How 'bout that? 🙂
Mian » Miroslav Mišo Medurić
Dear,
Got all of your points. Yes, surely I am SEO Expert and I already knew Google can index pages even if you disallow them in robots.txt by other resources (backlinks). But here we are talking about staging. So, who will backlink staging.
And also I found that you are giving 2-3 ways which you are confused about. But I tried this method on my client sites. And this method works. That's why I recommended this. You can try this if you do not get the results then please let me know. But i think after trying this method you can't. Anyway it's up to you.
Thanks
Miroslav Mišo Medurić » Mian
Really?
As an expert, you should consult documentation rather than argue on social media, and definitely learn more from it than forming opinions.
Experimenting is about discovering the unknown, not inventing the wheel (which doesn't turn).
block crawling via robots txt add noindex meta tags on each page password protect the pages what do they mean
🔗🏹

Mian » Miroslav Mišo Medurić
Thank you so much for the information you provided.
I think my purpose of landing here has already been completed because I want to get better in the knowledge and would love to learn things from you.
Sending you friend request.
Thank you!
Miroslav Mišo Medurić
Glad it is resolved. 🙂
By the way, to answer the "who will backlink staging" question, people talk about their staging sites online. A single link in a public discussion is all it takes. It could be an owner contacting support, for example.
Mian » Miroslav Mišo Medurić
Thanks for the point. I will also do my research. Nice to have a conversation with you on this.

📰👈

Miroslav Mišo Medurić
As pointed out above, noindex first.
Keep checking if pages are removed, by using the site: search operator.
Once the site is gone, apply a robots.txt disallow, or better yet, password-protect the site.
There is one thing where even the most experienced SEO users can trip on:
If the site is on WordPress, and if you never had your own custom robots.txt on it, then don't just hit the "Discourage search engines" button. Double check the robots.txt afterwards.
"Discourage search engines" was not designed to deindex sites, but to prevent indexing new websites. Hence it adds, or it used to add a robots.txt disallow directive along with the noindex tag. That would of course, prevent the site ever to be deindexed.
A note on the Search Console method: it lasts up to six months after which the site is back, unless extra measures are taken such as 404, a password protect or a noindex. But NOT robots.txt disallow.

Kunjal » Miroslav Mišo Medurić
So password protecting seems viable option now and even if that doesn't remove the site then I'll add noindex on each page as a last resort since that's a directive and ensure that it isn't getting links from anywhere
Miroslav Mišo Medurić » Kunjal
It doesn't work like that.
Noindex tag is a page level directive, while password protecting is happening at the server level.
So if you protect the site, Google would be unable to read pages, including the noindex tags.
If you opt to password protect the site immediately, you need to combine it with Google Search Console (GSC) removal, in order to make the removal permanent. Noindex is irrelevant in that case.
Otherwise, you need to place noindex tags and leave the door open for Google to see them.
Kunjal
Okay, so there are 2 options
1. Just password protect the site and then use GSC Removals to remove the staging subdomain from Search Engine Result Page (SERP)
Or
2. Just noindex all pages of the site and see if it gets removed from SERP.

Scott
It is actually possible to get every URL of the staging subdomain out of googles search results, and as others have said, you'll need to have it verified in Google search console, and you'll need the site to actually be gone, and a robots file so you can noindex everything.
Then, either wait many months for all the pages to go away or use this extension that used to be free, but is not anymore.
https://chrome.google.com/webstore/detail/webmaster-tools-bulk-url/pmnpibilljafelnghknefahibdnfeece
It's a monthly subscription, so you can pay for it as long as you need to keep using it, which if done correctly, would only be once.
Gossage
Don't use robots.txt to deindex already indexed pages. Meta robots will do, combined with url removal in Google Search Console (GSC).

Kunjal » Gossage
I'm conflicted between noindex and password protecting the pages, John Mueller also recommends password protecting the pages on the otherhand noindex is also a directive but here also some people have recommended password protection and server side authentication. So considering to go with pasword one if that doesn't work then will surely noindex all the pages
Gossage » Kunjal
Password protection will prevent indexing but won't deindex already indexed pages. Use meta robots to deindex and monitor coverage in gsc. Once no pages are indexed then add password.
Once noindex tag is in place you can submit pages for indexing. This will force Google to crawl pages and see the noindex tag to deindex them quicker. Unfortunately, you can only do 10-20 pages a day.
Kunjal
Okay so will have to add noindex on each page then and then once they are deindexed then will have to password protect them

📰👈

These may satisfy you:
» More Articles = More Chances of Ranking for Keywords = More Traffic? SEO
» What SEO Parts do They Include Duplicate Content?

RELATED POSTs

Leave a Reply

Your email address will not be published. Required fields are marked *