So apparently if Google can't get to your robots.txt file, that's the equivalent of blocking them. That's news to me.
94 👍🏽22 🤭19 🤯135 61 💬🗨
Ok. I setup mine now. lol
Latent Semantic Indexing (LSI) robots.txt. Hope that tomorrow i can still see my sites in Search Engine Result Page (SERP). lol
I've heard them say that before and it has always baffled me. Robots.txt is there to tell the spiders what NOT to do. The only reason for an "ALLOW" is to override a part of a directive you disallowed earlier. Otherwise, every part of the robots.txt file was designed to tell the spiders what NOT to do. Without a disallow directive to say otherwise, the default is "Allow".
Google has changed that over the years – they love to change and control standards that aren't theirs to control. Putting your site map in there and yes… treating its mere existence as permission to enter (and not the other way around) are examples of that.
They've done it with proper W3C web standards before on many occasions, too. The most recent (and biggest) one that comes to mind right off the bat is telling us that "Prev/Next" in archive pages are depreciated. Sure… it's fine to say that you ignore that – but it isn't cool to suggest that it's no longer part of following proper web standards.
True, but I doubt Google wants to make sites harder to crawl. It doesn't benefit them here. They do it for legal reasons. Explicit is better than implicit.
Truslow » Charles
I agree that explicit is better than implicit, but… it doesn't give them ownership over the robots.txt standard. They can define their own meta tags and ask us to use those to send the signal or they could have us create a "google.txt" file for our site if we want them to crawl us.
There are plenty of options out there without hijacking someone else's established standards.
Allen » Truslow
100% agree. I read an article back where John Mueller (of course) was saying "rel=next and prev aren't necessary anymore" and Bing was like "uhhhh we still use that" and I'm pretty sure ADA compliance insists on it too. Google's plan for the future and the QOL of the internet are unfortunately mutually exclusive more often than not.
Oh and to go on a bit of a rant, remember how they said not having Secure Sockets Layer (SSL) was going to be a serious rankings penalty? Or when they said that not having image alt text (which ADA requires) would as well? Or even when site speed would be a heavy ranking factor? Haha, that was funny. Oh, Google.
Yes – Bing and ADA and web standards in general use it. And, of course, when Big G said, "it's not necessary" all the SEO tools removed it as a feature and broke every site that uses those tools.
For image ALT text… it's an ADA thing yes, but actually the PROPER thing is to ONLY put alt text in images that are important to the context of the content. If it's just a decorative picture that breaks up the content for the eye or to add a splash of color but doesn't really matter to the subject at hand – you should NOT put alt text in there. If you do then your screen reader ends up going along and then says something like "… are important. <Photo of: keyword keyword keyword keyword spam> It is also important to remember that…"
When in doubt, I have always – every time – followed proper standards when they are in conflict with Google's recommendations. And in doing so, I have never had any trouble getting traffic and conversions to a client. (Or at least not any trouble that could be directly or even indirectly attributed to my following standards over G-Directives).
It's a ridiculous comment given how many website owners don't even know what a robots.txt file even is.
robots.txt is not a firewall and it is just a rule set for good bots like Googlebot. And, unless you tell these good bots not to crawl, which we do on staging sites, they do crawl and index pages. To block good bots, bad bots and users from certain IPs or countries, we either use .htaccess or firewall.
I know being a Google rep comes with an expectation to be vague, but in this case John seems to have forgotten basic terminology. "Unreachable" encompasses 404 not found (which not having a robots.txt file returns), but my guess is that what he did want to say is 403 forbidden (which would kind of make sense as a signal for Googlebot) and any 5xx status codes (that's just basic "decency" – if the server is struggling, don't go spam your requests at it when there might also be instructions that they would not know about if such a http response was returned).
Block crawling via Robots.txt, Add NoIndex Meta Tags on each page, Password Protect the pages | What do they mean?
To Block Bots E.g Ahrefs, Majestic, SEMrush, Etc, Except Google, Bing Bots