Antonio Javier Sutil Jiménez speaks in this article about the data provided in the study “Deep learning model for earlier detection of cognitive decline from clinical notes in electronic health records”.
Why is this study of a learning model from clinical notes important?
This study addresses the early detection of cognitive decline in adults, which is essential to enable successful therapeutic interventions, slow decline, prevent disease development, or facilitate the enrollment of participants in clinical trials.
Alzheimer’s disease
Alzheimer’s disease is a type of dementia that represents a major global problem. This disease has been diagnosed in nearly 6 million people in the United States, and its prevalence increases with age, so population aging is expected to increase its incidence over the coming years.
However, beyond Alzheimer’s disease, mild cognitive impairment is a highly relevant problem that in many cases is associated with subsequent development of dementia.
Subjective cognitive decline
Similarly, the category of subjective cognitive decline has recently been created. This term refers to the individual’s perception of experiencing a decline in their cognitive abilities compared with their previous state.
Although this label is not a disease in itself, it has been identified that people with this condition could be in an early stage of cognitive decline.
Detection of cognitive decline
Although great efforts are being made to improve treatments for these patients, the detection of cognitive decline remains a challenge, and improving detection tools is necessary for subsequent treatments to be effective.
Tools in primary care
Given that the number of specialized professionals to care for the at-risk population is limited, one possible solution could be to provide tools to primary care physicians. These physicians are not dementia specialists, but they have direct contact with this population, so equipping them with diagnostic tools is a viable solution.
Electronic medical records
The use of electronic medical records is proposed as a suitable alternative for developing such tools, since they collect patients’ visit histories within a healthcare system.
However, it is important to highlight the difficulty of identifying signs of cognitive decline not associated with age, which are often documented in cognitive assessments and in patients’ concerns recorded by healthcare professionals. Although studies have been conducted with patients’ clinical information, clinical notes in medical records have rarely been explored in depth for this purpose.
Clinical notes as an information resource
This study proposes the use of clinical notes as an information resource that could capture information not considered in most studies. Manually analyzing clinical notes would be very costly, so the study’s objective was to develop an automatic detection model based on deep learning.
Therefore, the approach of this study is original and novel by making use of clinical notes.
Clinical notes are very important for medical records in the clinical setting. However, their use in the scientific field has been limited, making their application for early detection of cognitive decline potentially highly interesting.
What was done?
Database
For this study, data were taken from a private healthcare company, filtering patients by age (they had to be older than 50 years) and by a diagnosis of mild cognitive impairment. Specifically, the clinical notes from the 4 years prior to diagnosis were analyzed.
The definition of cognitive decline was based on mention of symptoms, diagnosis, cognitive assessments, and treatments. When notes indicated progression, transient episodes, or reversible conditions, they were considered negative for cognitive decline.
Processing of clinical notes and database development
First, due to the length of clinical notes, a natural language processor was used to split them into sections. This division made it possible to identify whether each section indicated cognitive decline or not.
Next, keyword terms selected by experts were identified; these experts were trained to detect sections that contained signs of cognitive decline. Three annotators labeled the sections, and conflicts were resolved through discussions with subject-matter experts, achieving a good level of agreement among annotators.
In addition, a labeled dataset of 4,950 sections was created to train and test several machine learning algorithms. Finally, two databases were created that would be used for model development and validation.
Datasets
The first dataset, used for model development, included only sections with selected keyword terms. This dataset contained 4,950 annotated sections, ready for developing the machine learning models.
The second dataset consisted of 2,000 randomly selected sections from all notes, excluding those used in the first dataset. This second set was used to check the model’s ability to generalize to note sections without applying a keyword-based filter.
Model development and validation
To develop the model, they used a hierarchical attention structure based on deep learning that had been developed in a previous work, in addition to four baseline machine learning algorithms: logistic regression, random forest, support vector machine, and XGBoost.
The previously developed model incorporated a context-aware convolutional neural network, which allowed handling word variations and interpreting the prediction through attention layers. For more information about the model, the article and its supplementary tables are recommended.
Interpretation of the model’s prediction
To interpret the model’s prediction, the words with the greatest weight in the attention layers used for the prediction were identified. The words with a relevant weight, that is, at least 2 standard deviations above the mean, were considered high-attention and were compared with the original selected keywords.
On the other hand, for the baseline models, sections were represented by the term frequency, and the algorithms were trained and tested using cross-validation. Subsequently, the results of the model developed by the research group were compared with the 4 baseline models mentioned.
Comparison of metrics
The two measures used for metric comparison were AUROC (area under the receiver operating characteristic curve) and AUPRC (area under the precision-recall curve).
AUROC is a common analysis method in these models, as it allows evaluating different thresholds between sensitivity and specificity. AUPRC is another important metric that provides complementary information for unbalanced data, when the percentage of positive cases is low.

Subscribe
to our
Newsletter
What are the main conclusions of this study of a learning model from clinical notes?
The main conclusion of this study is that it is possible to make diagnostic predictions of cognitive decline using a model based on clinical notes. These patients could be in the earliest stages of cognitive decline, which would allow identifying early signals in electronic health records.
The model developed for this purpose was the best predictor for detecting patients who will develop cognitive decline, without relying on structured data. Although the deep learning model was the best, the XGBoost model also showed good predictions, and it is proposed as a simpler alternative if the necessary technology is not available.
AUROC and AUPRC metrics
To verify these results, the scores obtained in the AUROC and AUPRC metrics on datasets 1 and 2 can be observed (see tables 1 and 2, respectively). It is particularly notable that the deep learning-based model is the best predictor on both metrics.
In the case of AUROC, all values are above 0.9, with the deep learning model always predicting best. Regarding AUPRC, this is even more evident, since this model is the only one that remains above the value 0.9.
The differences between these metrics reinforce the consistency of the results, since while AUROC shows the relationship between true positive rate and false positive rate, AUPRC reflects the relationship between precision and recall.
In imbalanced samples, the AUROC metric can be less conservative with false positives, so the complementary information from AUPRC helps confirm the good performance of this model.
| Model | AUROC | AUPRC |
| Logistic regression | 0.936 | 0.880 |
| Random Forest | 0.950 | 0.889 |
| Support Vector Machine | 0.939 | 0.883 |
| XGBoost | 0.953 | 0.882 |
| Deep Learning | 0.971 | 0.933 |
| Model | AUROC | AUPRC |
| Logistic regression | 0.969 | 0.762 |
| Random Forest | 0.985 | 0.830 |
| Support Vector Machine | 0.954 | 0.723 |
| XGBoost | 0.988 | 0.898 |
| Deep Learning | 0.997 | 0.929 |
Model performance
Another point highlighted by this study is that note length could affect model performance; however, maintaining sufficient content demonstrates that section-based classification can be feasible.
In addition, this type of model could be applied to other pathologies, although it is important to consider that identifying ambiguous or complex information can be difficult.
How could NeuronUP contribute to a study like this?
NeuronUP could contribute in various ways to a study like this, as it has extensive experience working with large amounts of data.
As seen in this study, handling large volumes of data is one of the main challenges when working with clinical notes. Therefore, the NeuronUP team, which includes specialists in both the clinical field and data analysis, could make valuable contributions to information processing, whether through the use of keyword terms or without them.
On the other hand, this study stands out for comparing five different models, which gives robustness to the results obtained for their model. The NeuronUP team’s experience could also be useful in designing a specific model for this purpose, or in creating robust models to compare with the model developed.
Li Zhou. Professor of Medicine at Harvard Medical School for more than ten years, and the principal investigator at Brigham and Women’s Hospital. She holds a PhD in Biomedical Informatics from Columbia University, and her research has focused on natural language processing, knowledge management, and clinical decision support. In addition, she has been the principal investigator on numerous research projects funded by AHRQ, NIH, and CRICO/RMF.
Bibliography
- Wang L, Laurentiev J, Yang J, et al. Development and Validation of a Deep Learning Model for Earlier Detection of Cognitive Decline From Clinical Notes in Electronic Health Records. JAMA Netw Open. 2021;4(11):e2135174. doi:10.1001/jamanetworkopen.2021.35174
If you liked this blog post about the deep learning model for earlier detection of cognitive decline from clinical notes in electronic health records, you will likely be interested in these NeuronUP articles:
“This article has been translated. Link to the original article in Spanish:”
Modelo de aprendizaje profundo para la detección temprana del deterioro cognitivo a partir de notas clínicas en historias clínicas electrónicas







FOMO: The fear of being left out of the digital era
Leave a Reply