Google Is Using A.I. to Answer Your Health Questions. Should You Trust It?

Do you have a headache or is it a sinus infection? What does a stress fracture feel like? Should you be worried about the pain in your chest? If you Google those questions now, the answers may be written by artificial intelligence.

This month, Google rolled out a new feature called A.I. Overviews that uses generative A.I., a type of machine-learning technology that is trained on information from across the internet and produces conversational answers to some search questions in a matter of seconds.

In the weeks since the tool launched, users have encountered a wide array of inaccuracies and odd answers on a range of subjects. But when it comes to how it answers health questions, experts said the stakes were particularly high. The technology could point people toward healthier habits or needed medical care, but it also has the potential to give inaccurate information. The A.I. can sometimes fabricate facts. And if its answers are shaped by websites that aren’t grounded in science, it might offer advice that goes against medical guidance or poses a risk to a person’s health.

The system has already been shown to produce bad answers seemingly based on flawed sources. When asked “how many rocks should I eat,” for example, A.I. Overviews told some users to eat at least one rock a day for vitamins and minerals. (The advice was scraped from The Onion, a satirical site.)

“You can’t trust everything you read,” said Dr. Karandeep Singh, chief health A.I. officer at UC San Diego Health. In health, he said, the source of your information is essential.

Hema Budaraju, a Google senior director of product management who helps to lead work on A.I. Overview, said that health searches had “additional guardrails,” but declined to describe those in detail. Searches that are deemed dangerous or explicit, or that indicate that someone is in a vulnerable situation, such as with self-harm, do not trigger A.I. summaries, she said.

Google declined to provide a detailed list of websites that support the information in A.I. Overviews, but said that the tool worked in conjunction with the Google Knowledge Graph, an existing information system that has pulled billions of facts from hundreds of sources.

The new search responses do specify some sources; for health questions, these are often websites like the Mayo Clinic, WebMD, the World Health Organization and the scientific research hub PubMed. But it isn’t an exhaustive list: The tool can also pull from Wikipedia, blog posts, Reddit and e-commerce websites. And it doesn’t tell users which facts came from which sources.

With a standard search result, many users would be able to distinguish immediately between a reputable medical website and a candy company. But a single block of text that combines information from multiple sources might cause confusion.

“And that’s if people are even looking at the source,” said Dr. Seema Yasmin, the director of the Stanford Health Communication Initiative, adding, “I don’t know if people are looking, or if we’ve really taught them adequately to look.” She said her own research on misinformation had made her pessimistic about the average user’s interest in looking beyond a quick answer.

As for the accuracy of the chocolate answer, Dr. Dariush Mozaffarian, a cardiologist and professor of medicine at Tufts University, said that it had some facts mostly correct and that it summarized research into chocolate’s health benefits. But it does not distinguish between strong evidence provided by randomized trials and weaker evidence from observational studies, he said, or provide any caveats on the evidence.

It’s true that chocolate contains antioxidants, Dr. Mozaffarian said. But the claim that chocolate consumption could help prevent memory loss? That hasn’t been clearly proved, and “needs a lot of caveats,” he said. Having such claims listed next to one another gives the impression that some are better established than they really are.

The answers can also change as the A.I. itself evolves, even when the science behind a given answer hasn’t changed.

A Google spokesperson said in a statement that the company worked to show disclaimers on responses where they were needed, including notes that the information that shouldn’t be treated as medical advice.

It’s not clear how, exactly, AI Overviews evaluate the strength of evidence, or whether it takes into account contradictory research findings, like those on whether coffee is good for you. “Science isn’t a bunch of static facts,” Dr. Yasmin said. She and other experts also questioned whether the tool would draw on older scientific findings that have since been disproved or don’t capture the latest understanding of an issue.

“Being able to make a critical decision — to discriminate between quality of sources — that’s what humans do all the time, what clinicians do,” said Dr. Danielle Bitterman, a physician-scientist in artificial intelligence at Dana-Farber Cancer Institute and Brigham and Women’s Hospital. “They are parsing the evidence.”

If we want tools like A.I. Overviews to play that role, she said, “we need to better understand how they would navigate across different sources and how they apply a critical lens to arrive at a summary,” she said.

Those unknowns are concerning, experts said, given that the new system elevates the A.I. Overview response over individual links to reputable medical websites such as those for the Mayo Clinic and the Cleveland Clinic. Such sites have historically risen to the top of the results for many health searches.

A Google spokesperson said that A.I. Overviews will match or summarize the information that appears in the top results of searches, but isn’t designed to replace that content. Rather, the spokesperson said, it’s designed to help people get a sense of the information available.

The Mayo Clinic declined to comment on the new responses. A representative from the Cleveland Clinic said that people seeking health information should “directly search known and trusted sources” and reach out to a health care provider if they’re experiencing any symptoms.

A representative from Scripps Health, a California-based health care system cited in some A.I. Overview summaries, said in a statement that “citations in Google’s A.I. generated responses could be helpful in that they establish Scripps Health as a reputable source of health information.”

However, the representative added, “we do have concerns that we cannot vouch for the content produced through A.I. in the same way we can for our own content, which is vetted by our medical professionals.”

For medical questions, it’s not just the accuracy of an answer that matters, but how it’s presented to users, experts said. Take the question “Am I having a heart attack?” The A.I. response had a useful synopsis of symptoms, said Dr. Richard Gumina, the director of cardiovascular medicine at the Ohio State University Wexner Medical Center.

But, he added, he had to read past a long list of symptoms before the text advised him to call 911. Dr. Gumina also searched “Am I having a stroke?” to see whether the tool might produce a more urgent response — which it did, telling users in the first line to call 911. He said he would immediately advise patients experiencing symptoms of a heart attack or a stroke to call for help.

Experts encouraged people looking for health information to approach the new responses with caution. Essentially, they said, users should take note of the fine print under some A.I. Overviews answers: “This is for informational purposes only. For medical advice or diagnosis, consult a professional. Generative A.I. is experimental.”

Dani Blum contributed reporting.

by NYTimes