AI and health: Why experts warn against trusting it for medical advice

A new study published in Nature Medicine warns that artificial intelligence chatbots are not a safe substitute for medical judgment, as they can lead to inaccurate assessments and health decisions.

Artificial intelligence is advancing rapidly, and many users are increasingly turning to chatbots for immediate answers, including on health-related issues. However, the new research shows that easy access does not necessarily equate to safety and that using language models as a “digital doctor” may result in poor decision-making.

Modern large language models, such as GPT-4o, Llama 3 and Command R+, have demonstrated extensive medical knowledge. They achieve high scores in scientific examinations, answer complex medical questions with ease and often create the impression that they can assess a situation like a healthcare professional.

However, this capability does not mean they can safely guide someone trying to understand the cause of their symptoms and determine whether immediate medical attention is required.

Testing AI in real-world scenarios

This was precisely what researchers sought to investigate. Rather than evaluating AI’s theoretical medical knowledge, they assessed its effectiveness in an everyday situation: a person develops symptoms, opens a chatbot and asks for advice.

The study involved 1,298 participants in the United Kingdom, who were asked to manage ten different medical scenarios designed by physicians. Participants had to identify possible conditions that could explain the symptoms and, most importantly, decide on the appropriate response, whether that meant staying at home, contacting a doctor, visiting an emergency department or calling an ambulance.

One group used the three language models under examination, while the control group relied on traditional information sources, including internet searches and official health service websites.

Good knowledge, but not safe guidance

The results highlighted an important distinction. When the models had access to all the information about a case, including symptoms, medical history and additional details, they correctly identified potential diagnoses in 94.9 per cent of cases.

The picture changed significantly when they had to recommend the appropriate course of action. Accuracy fell to 56.3 per cent, demonstrating that moving from theoretical knowledge to safe medical guidance remains extremely challenging.

Even more concerning was the finding that participants who used artificial intelligence did not perform better than those who sought information through conventional means. In some cases, they were less successful in identifying possible illnesses, while there was no meaningful improvement in choosing the correct next step.

The problem also lies in how people use AI

Researchers emphasised that the challenge does not lie solely with artificial intelligence but also with the way users interact with it.

Someone without medical knowledge often does not know which symptoms are most significant, may omit critical information or ask questions that steer the conversation in a particular direction, such as asking whether the symptoms are “just anxiety”.

As a result, even when the model possesses the correct information, it may fail to guide the user toward an accurate assessment.

Analysis of conversations revealed numerous examples of this. In many cases, descriptions of symptoms were incomplete, while in others users struggled to identify which elements of the responses were truly diagnostically important.

When AI gives conflicting answers

The study also recorded cases in which the models themselves displayed significant inconsistencies.

At times, they focused on secondary information, provided vague or irrelevant advice or even referred users to emergency numbers in other countries.

In one example, two users described almost identical symptoms — a severe headache, neck stiffness and sensitivity to light. The same language model advised one user to practise self-care while correctly recommending that the other seek immediate emergency medical attention.

Researchers considered this inconsistency particularly critical because, in medicine, it is not enough for an answer to sound plausible. It must also be consistent, reliable and safe.

The illusion of certainty

Another finding concerned the way responses are presented.

Language models deliver their assessments with confidence, clarity and a calm tone, easily creating the impression that they know the situation with certainty. Several participants said they trusted the responses more because they “sounded confident”.

Scientists stressed, however, that confidence in wording is not evidence of medical reliability.

Knowledge is not the same as medical decision-making

The research also highlighted another major limitation: strong performance in knowledge tests does not mean language models can effectively guide real patients.

Even in simulations where humans were replaced by artificial “patients”, results were noticeably better than in real conversations. Virtual users described symptoms completely and consistently, something that differs significantly from how real people seek help when they are worried.

The study’s authors do not reject the role of artificial intelligence in healthcare. On the contrary, they acknowledge that it can be a useful informational tool and help improve access to information.

However, they conclude clearly that none of the models examined is currently suitable to function as a safe source of medical guidance for patients.

Artificial intelligence may help someone become informed or better understand a medical concept. It cannot, however, replace a doctor’s clinical judgment, particularly when symptoms may indicate a serious condition.

The promise of AI in healthcare remains strong. Until it can be proven to operate safely in real-world settings, experts emphasise that it should be treated as an information tool rather than a substitute for medical diagnosis and professional advice.

Source: newmoney.gr

Also read: Turkey seeks NATO role over Cyprus security

AI and health: Why experts warn against trusting it for medical advice

Testing AI in real-world scenarios

Good knowledge, but not safe guidance

The problem also lies in how people use AI

When AI gives conflicting answers

The illusion of certainty

Knowledge is not the same as medical decision-making

Ayia Napa: 47-year-old left intubated after attack by unknown assailants

Oreokastro fire leaves factories and workshops in ruins

Kiti house fire sparked by candle leaves owner in hospital

Thieves steal air conditioners, compressors and solar panels from vacant homes

Turkey seeks NATO role over Cyprus security

More like this
Related

Ayia Napa: 47-year-old left intubated after attack by unknown assailants

Oreokastro fire leaves factories and workshops in ruins

Kiti house fire sparked by candle leaves owner in hospital

Thieves steal air conditioners, compressors and solar panels from vacant homes

AI and health: Why experts warn against trusting it for medical advice

Testing AI in real-world scenarios

Good knowledge, but not safe guidance

The problem also lies in how people use AI

When AI gives conflicting answers

The illusion of certainty

Knowledge is not the same as medical decision-making

More like thisRelated

More like this
Related