How does Google know if the results they are showing for medical/scientific/technical (and other similar precise and important) question based queries, are factually correct?
6 👍🏽 6
It is very similar to the techniques used in building the Knowledge Graph, to the extent it is likely repurposing much of the same data.
So, basically, the more pages on the web who provide the same answer to the same question, Google will see that answer as factual? I work a lot in the Health space and read a lot of top ranking content which is not just factually incorrect, but it would cause harm if followed.
Ammon Johns 🎓
Not volume, but trust, Mark
. If Google have picked up the fact on other trusted sites through the knowledge graph then it can get a factual 'boost'. If not, it won't get the boost, and so won't have any 'penalty' or disadvantage unless rival pages DO have the boost. A simple but effective system.
Mark ✍️ » Ammon Johns
Makes perfect sense. I'd like to delve into this a little deeper for more context. What factors make these informational sites a 'trusted' source by Google? And if this is the case, why are there so many top ranking (trusted) pieces which contain factually incorrect information?
Ammon Johns 🎓 » Mark
you can bet they started from a very select, hand-picked seed set. For health stuff that's likely to be official sites like the WHO, the NHS, the CDC, and similar. For Finance sites pretty much the same deal – government sites and official watchdog and regulatory bodies, also likely drawing from Stock reports, etc.
Once there is an initial trusted seed set, it can be easily expanded algorithmically by measuring the convergence (degree of matching) of other sources to those trusted ones. Anything that matches those facts on the trusted set 100% is probably just as accurate, and can therefore be flagged for review by human for whether that accuracy can reasonably be trusted to remain true.
I'm working with a client that serves the Medical sector, among others.
From what I see, there is a very low understanding, interest and capacity to recognize SEO expertise by decision makers in these industries.
This causes reputable institutions to have a substandard web presence. The level of protectionism, office politics etc is so entrenched in larger organizations that it takes stubbornness to push through an idea that would be a game changer for their web presence. And even then, you have to keep a raised guard to protect that initiative.
This i think is part of why factual websites get outrank by others.
You can attribute this to some institutional(ized) arrogance, as if Google 8s dumb for not recognizing the quality authors post. You can call it sluggish adoption of digital marketing ideas.
And, at the opposite end, you get small, nimble, growth-obsessed small media companies who know their stuff and stamp out substandard content that's better suited to the general public.
Ammon Johns 🎓 » Mark
the second part of that, regarding factually incorrect material, is a trickier issue, and surprisingly a separate one. The initial use of the knowledge base was to identify and extract factoids simply for answer boxes. Expanding the fact checking to YMYL stuff was later, but not a massive leap. And, as stated above in my earlier comments, only needed to provide a boost when known trust was involved – it never needed to say what was false, or even what was true, only what had already been VERIFIED to be true.
Fact Checking for incorrect info is far, far newer, and only really an issue brought about after pressure created by the US Government about 'fake news'. That needs an entirely different approach, and is more about recognizing and identifying things known to be NOT true, rather than recognizing commonly agreed and trusted factoids.
Mark ✍️ » Ammon Johns
Thanks, Ammon. I can totally see why the two are very different entities. You have explained it perfectly!
Google recently announced what for lack of a better description I will call a "fact-checking engine". They're using it in Google News. I don't believe it's being used with medical queries in general Web search but I could be mistaken. You can play with their public-facing fact-checking tool here:
Fact Check Tools
Here is their blog post from September 2020.
Our latest investments in information quality in Search and News
Ammon Johns 🎓
For anyone interested in the topic Martinez
brought up there, as it affects things like breaking news, burstiness (a thing becoming suddenly more important or topical), etc, the term the Information Retrieval folks often use to describe time related issues is "Temporal Web Dynamics" (
https://www.google.com/search?q=temporal+web+dynamics) which discusses the potential signals, uses, and applications, and is always a good deep-dive.
temporal web dynamics – Google Search
Mark ✍️ » Martinez
I do remember something being mentioned about this but haven't actually explored it yet. I'll go and have a play around and see what I can unearth.
Google could (probably does) use something akin to BioSentVec (https://arxiv.org/abs/1810.09302) which creates embeddings based on a corpus of highly authoritative medical publications.
From there Google could measure the distance between the embeddings on a document to those from the highly authoritative corpus. A short distance means they are similar, a long distance means they are not.
Scoring documents this way would make it easy to identify 'natural news' (i.e – coffee enemas cure cancer) and demote it.
BioSentVec: creating sentence embeddings for biomedical texts
Demote? Anyone wasting perfectly good coffee should be shot! 😃
says is pretty close to my understanding, and I've worked a few healthcare domains.
Google does NOT 'fact-check' or 'verify' medical info. They use "monkey-see, monkey-do" methods that rely on links and citations of relevant, semantically-similar topics on peer-reviewed medical publication entities. If the Lancet or Journal of the AMA is routinely cited, if the author entity has published (frequency), commented on (vetted profile), sits on a review board (named among subject matter experts with degrees, publications or books of their own, plus degrees, licenses, educational positions at "seed set" institutions… all the vectors line up at length and lo'! Google has satisfied their "likelihood of fact by reputation signals".
Google also has tried to layer on their own methods of pre-certification to achieve higher probabilities of trustworthiness, e.g. by halting ads until mental health counseling outfits subscribe to a particular certifying authority. As already mentioned, the quality of self-published verifiable credentials is widely variable, even among otherwise respected offices and institutions.
Reminds me of when Google purchased (and ruined) Zagat in an attempt to arrive at unadulterated editorial-quality restaurant reviews by which to rank eateries.