What did the study using large language models analyze?

More than 40,000 pediatric clinical reports were used to train models (GPT-4 and Clinician-LLaMA) to predict ASD diagnosis from report text alone, revealing language patterns linked to clinicians' decisions.

What were the main findings about autism diagnosis?

Models predicted ASD with high accuracy and showed repetitive behaviors, restricted interests, and sensory traits were most determinative, while social interaction difficulties had less weight in clinicians' written reports.

Which behavioral factors did the language models identify as most important?

Repetitive, stereotyped behaviors, restricted interests and sensory/perceptual traits carried the greatest weight in model-based diagnosis, according to analysis of influential text fragments.

Do large language models replace clinicians in ASD diagnosis?

No; models are positioned as diagnostic support tools that can highlight implicit clinical reasoning, detect biases, and provide second opinions, not as replacements for professional judgment.

Should DSM-5 diagnostic criteria for autism be reevaluated?

The study suggests reevaluating criterion weightings, since real-world clinical practice may emphasize observable repetitive behaviors more than current DSM-5 descriptions imply.

How could NeuronUP contribute to similar studies?

NeuronUP could enable replication across diverse, non-English populations and add structured cognitive activity data to complement clinical reports, aiding broader validation and early detection research.

Language Models for Autism Diagnosis

In this article, PhD candidate Marta Arbizu Gómez presents the study «Large language models deconstruct the clinical intuition behind diagnosing autism», which explores the impact of using large-scale language models for the diagnosis of autism.

Introduction

The diagnosis of autism spectrum disorder (ASD) has traditionally been a complex task, relying heavily on clinical experience, detailed observation and the interpretation of diverse behaviors. Although there are well-defined diagnostic guidelines such as the DSM-5, clinical practice is often guided by an “intuition” that professionals develop after years of experience. But what if we could “read” that intuition and understand it from a more objective approach?

A recent study published in the journal Cell, titled “Large language models deconstruct the clinical intuition behind diagnosing autism”, explores precisely this possibility: using large language models (LLMs) to unravel the patterns clinicians follow when diagnosing autism. The findings are not only surprising, but could also have profound implications for how we understand and carry out ASD diagnoses today.

The context: why is it necessary to reassess the way we diagnose autism?

ASD is a neurodevelopmental disorder characterized by challenges in social communication and restricted, repetitive patterns of behavior and interests. However, these characteristics can present with great variability between individuals, which makes diagnosis a nuanced and sometimes subjective process.

Moreover, while standardized diagnostic tools such as the ADOS or ADI-R provide structure to the process, many diagnoses are based on narrative reports written by clinicians who have observed the patient. In other words, the way the clinician describes the patient can carry great weight in the final diagnosis.

Faced with this reality, the researchers posed a key question: which elements within these written reports are actually guiding diagnostic decisions?

Subscribe
to our
Newsletter

What did the researchers do?

The study authors collected more than 40,000 clinical reports of pediatric patients from the Massachusetts public health system. These reports, written by mental health professionals, contained detailed descriptions of patients’ behavior and functioning.

With this database, the researchers trained several language models, including GPT-4 (developed by OpenAI) and an open-source clinical model called Clinician-LLaMA. The idea was for the models to learn to predict whether a clinical report corresponded to a patient with an ASD diagnosis or not, based only on the text.

The results were surprising: the models achieved a remarkable accuracy in classification, even when key information such as the patient’s sex or age was hidden. This suggested that the reports contained implicit language patterns that the models could detect and that reflected how clinicians make decisions.

What did they find?

Beyond prediction accuracy, the most interesting aspect was what the models revealed about the diagnostic process itself. By analyzing which text fragments carried the most weight in the models’ decisions, the researchers identified that certain types of behaviors and descriptions were more determinative than others.

Chart with the factors that carry the most weight in ASD diagnosis according to language models. — Figure 1. Factors with the greatest weight in ASD diagnosis according to language models. Author’s own.

As we can see in the chart, repetitive, stereotyped behaviors, restricted interests and aspects related to sensory perception were the factors most associated with a positive ASD diagnosis. In contrast, difficulties in social interaction, which are one of the traditional pillars of diagnosis according to the DSM-5, turned out to carry less weight in the models.

This does not mean that social difficulties are not relevant, but rather that, in practice, clinicians seem to pay more attention —perhaps unconsciously— to other behavioral patterns when deciding whether a patient meets diagnostic criteria.

Below, the main results of the study are summarized in a table to facilitate understanding:

Analyzed aspect	Result / Observation
Model used	GPT-4 and Clinician-LLaMA (language models trained on clinical reports).
Data source	Over 40,000 pediatric clinical reports from the Massachusetts public health system.
Model task	Predict whether the patient had an autism diagnosis based solely on the report text.
Model accuracy	High, even when variables such as sex or age were hidden.
Most determining factors in diagnosis	Repetitive behaviors, restricted interests and sensory/perceptual traits.
Less determining factors	Difficulties in social interaction.
Key implication	In clinical practice, observable behaviors influence more than expected.
Potential impact on diagnostic criteria	Suggests the need to reevaluate the weight of certain criteria in the DSM-5.
Application of AI in mental health	As a tool for diagnostic support and analysis of clinical reasoning.

Table 1. Summary of the study’s main results. Author’s own.

As can be seen, the language models not only managed to predict an ASD diagnosis with high accuracy, but they also revealed that certain behavioral patterns —particularly repetitive behaviors and restricted interests— are more influential in clinical practice than traditional diagnostic criteria might suggest. This opens the door to reflection on how these criteria are applied in real-world contexts.

Implications: should we rethink the diagnostic criteria for autism?

These findings open an important discussion: do current diagnostic criteria really reflect the way professionals evaluate patients?

If clinicians systematically give more importance to observable behaviors such as stereotypies or restricted interests, it may be necessary to reevaluate the weight given to each diagnostic category in official guidelines.

In addition, this approach could have implications for the training of new professionals, who could benefit from understanding how criteria are applied in real practice, beyond theory.

Can artificial intelligence help in the clinical diagnosis of ASD?

One of the great promises of artificial intelligence in the health field is its ability to detect complex patterns in large volumes of data. In this case, language models not only act as classification tools, but also as instruments that allow us to make the invisible visible: the implicit logic behind clinical decisions.

Far from replacing professionals, these models can act as allies, offering a second opinion based on thousands of previous cases, and helping to detect biases or inconsistencies in diagnostic processes.

Where could NeuronUP contribute in studies like this?

NeuronUP could contribute significantly to studies like this by facilitating replication in more diverse and non–English-speaking populations, thanks to its international presence. Its platform, with hundreds of cognitive activities, would allow complementing the analysis of clinical reports with structured data on cognitive performance. In addition, this approach could be applied to other clinical conditions such as ADHD or mild cognitive impairment, improving early detection and diagnostic accuracy.

Conclusion of the study

This study marks a milestone at the intersection of artificial intelligence and mental health. By using language models to analyze clinical reports, the researchers have not only demonstrated that autism diagnosis can be predicted with remarkable accuracy, but also revealed how the “clinical intuition” that guides these decisions is constructed.

In the near future, tools like these could be integrated into health systems to provide diagnostic support, improve professional training, and perhaps even redefine the criteria with which we understand autism. What is clear is that artificial intelligence is not only transforming technology, but also our way of understanding the human mind.

References

Feng S, Sondhi R, Tu X, Buckley J, Sands A, Comiter A, Zhang H, Gao R, Sragovich S, Mello JD, Fedorenko E, Saxe R, Sontheimer EJ, Sapiro G, O’Reilly UM, McCoy TH, Beam AL. Large language models deconstruct the clinical intuition behind diagnosing autism. Cell. 2024 Mar 21. doi: 10.1016/j.cell.2024.03.004.