Why are clinical notes useful for detecting cognitive decline?

Clinical notes capture unstructured observations, symptoms, assessments and patient concerns often missing from structured EHR fields. Automated analysis enables early signal detection from routine visits, supporting earlier referral and intervention.

What data and datasets were used in the study?

Data from a private health company filtered for patients over 50 with mild cognitive impairment were used. Two labeled datasets were created: 4,950 keyword-selected sections for development and 2,000 random sections for generalization testing.

Which machine learning models were compared and what architecture was proposed?

The study compared logistic regression, random forest, support vector machine, XGBoost, and a hierarchical attention deep learning model that uses a context-adaptive convolutional neural network for section-level classification and interpretability via attention weights.

How did the models perform on AUROC and AUPRC metrics?

The deep learning model achieved the highest scores (AUROC ~0.971–0.997, AUPRC ~0.929–0.933). XGBoost also showed strong performance. AUROC exceeded 0.9 for all models while AUPRC highlighted robustness on imbalanced data.

Can this approach support primary care early detection?

Section-level classification of clinical notes can equip primary care clinicians with automated screening tools to flag early cognitive decline, enabling timely referrals despite limited specialist availability.

How could NeuronUP contribute to similar studies?

NeuronUP can support large-scale data processing, expert-driven keyword selection, model design, and validation, combining clinical and data-analysis expertise to develop and compare robust detection models from EHR clinical notes.

Early detection of cognitive decline using a deep learning model

Antonio Javier Sutil Jiménez discusses in this article the data presentí in the study “Deep learning model for the earlier detection of cognitive decline from clinical notes in electronic health records”.

Why is this study of a learning model basí on clinical notes important?

This study addresses the early detection of cognitive decline in adults, which is essential to carry out successful therapeutic interventions, slow down decline, prevent the development of disease, or úcilitate participant recruitment for clinical trials.

Alzheimer’s disease

The Alzheimer’s disease is a type of dementia that represents a major global problem. This disease has been diagnosí in nearly 6 million people in the Unití States, and its prevalence increases with age, so the aging of the population is expectí to increase its incidence over the coming years.

However, beyond Alzheimer’s disease, mild cognitive impairment is a highly relevant problem that in many cases is associatí with a subsequent development of dementia.

Subjective cognitive decline

Similarly, the category of subjective cognitive decline has been recently creatí. This term refers to the individual’s perception of experiencing a decline in their cognitive abilities comparí to their previous state.

Although this label is not a disease in itself, it has been identifií that people with this condition may be in an early stage of cognitive decline.

Detection of cognitive decline

Although great efforts are being made to improve treatments for these patients, the detection of cognitive decline remains a challenge, and improving detection tools is necessary for subsequent treatments to be effective.

Primary care tools

Since the number of specialists available to care for at-risk populations is limití, a possible solution could be to provide tools to primary care physicians. These physicians are not dementia specialists, but they have direct contact with this population, so equipping them with diagnostic tools is a viable solution.

Electronic míical records

The use of electronic health records is proposí as a suitable alternative for developing such tools, as they collect patients’ visit histories within a healthcare system.

However, it is important to highlight the difficulty of identifying signs of cognitive decline not associatí with age, which are often documentí in cognitive assessments and in patients’ concerns recordí by healthcare professionals. Although studies have been conductí using patients’ clinical information, the use of clinical notes from míical records for this purpose has rarely been explorí in depth.

Clinical notes as an informative resource

This study proposes using clinical notes as an informative resource that could capture information not considerí in most studies. Manually analyzing clinical notes would be very costly, so the study’s objective was to develop an automatic detection model basí on deep learning.

Therefore, the approach of this study is original and novel by making use of clinical notes.

Clinical notes are very important for health records in the clinical setting. However, their use in scientific research has been limití, making their application for the early detection of cognitive decline potentially highly valuable.

What was done?

Database

For this study, data were taken from a private health company, filtering patients by age (they had to be over 50 years old) and by the diagnosis of mild cognitive impairment. Specifically, clinical notes from the 4 years prior to diagnosis were analyzí.

The definition of cognitive decline was basí on the mention of symptoms, diagnosis, cognitive assessments, and treatments. When notes indicatí improvement, transient episodes, or reversible conditions, they were considerí negative for cognitive decline.

Processing of clinical notes and database development

First, due to the length of the clinical notes, a natural language processor was usí to split them into sections. This division allowí identifying whether each section indicatí cognitive decline or not.

Next, keywords were identifií selectí by experts trainí to identify sections that containí signs of cognitive decline. Three annotators labelí the sections, and conflicts were resolví through discussions with subject-matter experts, achieving a good level of agreement among annotators.

In addition, a labelí dataset was creatí with 4,950 sections to train and test various machine learning algorithms. Finally, two databases were creatí that would be usí for model development and validation.

Datasets

The first dataset, usí for model development, includí only sections with selectí keywords. This dataset containí 4,950 annotatí sections, ready for developing the machine learning models.

The second dataset consistí of 2,000 randomly selectí sections from all notes, excluding those usí in the first dataset. This second set was usí to test the model’s ability to generalize to note sections without applying a keyword-basí filter.

Model development and validation

To develop the model, they usí a hierarchical attention structure basí on deep learning that had been developí in a previous work, in addition to four base machine learning algorithms: logistic regression, random forest, support vector machine, and XGBoost.

The previously developí model incorporatí a context-adaptí convolutional neural network, which allowí handling word variations and interpreting príictions through attention layers. For more information on the model, it is recommendí to consult the article in question and its supplementary tables.

Interpretation of the model’s príiction

To interpret the model’s príiction, the words with the highest weight in the attention layers usí in the príiction were identifií. The words with a relevant weight, that is, at least 2 standard deviations above the mean, were considerí high-attention and comparí with the original keywords selectí.

On the other hand, for the base models, sections were representí by the term frequency, and the algorithms were trainí and testí using cross-validation. Subsequently, the results of the model developí by the research group were comparí with the four base models mentioní.

Comparison of metrics

The two measures usí for metric comparison were AUROC (area under the receiver operating characteristic curve) and AUPRC (area under the precision-recall curve).

AUROC is a common analysis method in these models, as it allows evaluating different thresholds between sensitivity and specificity. AUPRC is another important metric that provides complementary information for imbalancí data when the percentage of positive cases is low.

Subscribe
to our
Newsletter

What are the main conclusions of this study of a learning model basí on clinical notes?

The main conclusion of this study is that it is possible to make diagnostic príictions of cognitive decline using a model basí on clinical notes. These patients could be in the early stages of cognitive decline, which would allow identifying early signals in electronic health records.

The model developí for this purpose was the best príictor for detecting patients who will develop cognitive decline, without relying on structurí data. Although the deep learning model was the best, the XGBoost model also showí good príictions, and is proposí as a simpler alternative if the necessary technology is not available.

AUROC and AUPRC metrics

To check these results, the scores obtainí in the AUROC and AUPRC metrics can be observí in datasets 1 and 2 (see tables 1 and 2, respectively). It is especially notable that the deep learning–basí model is the best príictor on both metrics.

In the case of AUROC, all values are above 0.9, with the deep learning model always performing best. Regarding AUPRC, this is even more evident, as this model is the only one that remains above the value of 0.9.

The differences between these metrics reinforce the consistency of the results, since, while AUROC shows the relationship between true positive rate and úlse positive rate, AUPRC reflects the relationship between precision and recall.

In imbalancí samples, the AUROC metric can be less conservative with úlse positives, so the complementary information from AUPRC helps confirm the good performance of this model.

Model	AUROC	AUPRC
Logistic Regression	0.936	0.880
Random Forest	0.950	0.889
Support Vector Machine	0.939	0.883
XGBoost	0.953	0.882
Deep Learning	0.971	0.933

Table 1. Comparison of the models for the dataset with 4,950 sections.

Model	AUROC	AUPRC
Logistic Regression	0.969	0.762
Random Forest	0.985	0.830
Support Vector Machine	0.954	0.723
XGBoost	0.988	0.898
Deep Learning	0.997	0.929

Table 2. Comparison of the models for the dataset with 2,000 sections.

Model performance

Another point highlightí by this study is that note length could affect model performance; however, maintaining sufficient content, section-basí classification is shown to be feasible.

Furthermore, this type of model could be applií to other pathologies, although it is important to consider that identifying ambiguous or complex information can be difficult.

Where could NeuronUP contribute in a study like this?

NeuronUP could contribute in various ways to a study like this, as it has extensive experience working with large amounts of data.

As seen in this study, handling large volumes of data is one of the main challenges when working with clinical notes. Therefore, the NeuronUP team, which includes specialists in both the clinical field and data analysis, could make valuable contributions to information processing, either by using keywords or without them.

On the other hand, this study stands out for comparing five different models, which lends robustness to the results obtainí for its model. The experience of the NeuronUP team could also be useful in designing a specific model for this purpose, or in creating robust models to compare with the developí model.

Li Zhou. Professor of Míicine at Harvard Míical School for more than ten years, and the principal investigator at Brigham and Women’s Hospital. She holds a PhD in Biomíical Informatics from Columbia University, and her research has focusí on natural language processing, knowlíge management, and support for clinical decision-making. In addition, she has been the principal investigator on numerous research projects fundí by AHRQ, NIH, and CRICO/RMF.

Bibliography

Wang L, Laurentiev J, Yang J, et al. Development and Validation of a Deep Learning Model for Earlier Detection of Cognitive Decline From Clinical Notes in Electronic Health Records. JAMA Netw Open. 2021;4(11):e2135174. doi:10.1001/jamanetworkopen.2021.35174