Antonio Javier Sutil Jiménez discusses in this article the data presented in the study “Deep Learning Model for Early Detection of Cognitive Decline from Clinical Notes in Electronic Health Records”.
Why is this Study on Learning Models from Clinical Notes Important?
This study addresses the early detection of cognitive decline in adults, which is crucial for enabling successful therapeutic interventions, slowing down decline, preventing disease development, or facilitating participant enrollment in clinical trials.
Alzheimer’s Disease
Alzheimer’s disease is a type of dementia that represents a significant global issue. This disease has been diagnosed in nearly 6 million people in the United States, and its prevalence increases with age, meaning that the aging population is expected to raise its incidence over the coming years.
Beyond Alzheimer’s disease, however, mild cognitive impairment is also a highly relevant issue, which in many cases is associated with later development of dementia.
Subjective Cognitive Decline
Recently, a new category of subjective cognitive decline has also been created. This term refers to the individual’s perception of experiencing a decline in cognitive abilities compared to their previous state.
While this label is not a disease in itself, it has been found that people with this condition may be in an early stage of cognitive decline.
Detection of Cognitive Decline
Despite significant efforts to improve treatments for these patients, detecting cognitive decline remains a challenge, and improving detection tools is necessary to ensure subsequent treatments are effective.
Primary Care Tools
Since the number of specialized professionals available to treat at-risk populations is limited, one possible solution could be to provide primary care physicians with the tools. These doctors are not dementia specialists, but have direct contact with this population, so equipping them with diagnostic tools is seen as a viable solution.
Electronic Medical Records
The use of electronic medical records is proposed as a suitable alternative for developing these tools, as these records collect patients’ visit histories within a healthcare system.
However, it is important to note the difficulty in identifying signs of cognitive decline not associated with age, which are often documented in cognitive assessments and in patient concerns recorded by healthcare professionals. While studies have been conducted using patients’ clinical information, in-depth use of clinical notes in medical records for this purpose has been rare.
Clinical Notes as an Informational Resource
This study proposes the use of clinical notes as an informational resource that could capture data not considered in most studies. Manual analysis of clinical notes would be very costly, so the study’s goal was to develop an automated detection model based on deep learning.
Thus, the approach of this study is original and innovative, using clinical notes.
Clinical notes are very important for health records in the clinical field. However, their scientific use has been limited, making their application for the early detection of cognitive decline potentially very valuable.
What Has Been Done?
Database
For this study, data from a private healthcare company was used, filtering patients by age (over 50 years) and diagnosis of mild cognitive impairment. Specifically, clinical notes from the four years prior to diagnosis were analyzed.
The definition of cognitive impairment was based on the mention of symptoms, diagnosis, cognitive evaluations, and treatments. When notes indicated transient, reversible episodes, they were considered negative for cognitive impairment.
Processing Clinical Notes and Database Development
First, due to the length of the clinical notes, a natural language processor was used to divide them into sections. This division allowed for identifying whether each section indicated cognitive impairment or not.
Next, keywords selected by experts were identified, with experts trained to identify sections containing signs of cognitive impairment. Three annotators labeled the sections, and conflicts were resolved through discussions with subject matter experts, achieving a good level of agreement among annotators.
Additionally, a labeled dataset with 4,950 sections was created to train and test various machine learning algorithms. Finally, two databases were created for the development and validation of the model.
Datasets
The first dataset, used for model development, included only sections with selected keywords. This dataset contained 4,950 annotated sections, ready for machine learning model development.
The second dataset consisted of 2,000 randomly selected sections from all notes, excluding those used in the first set. This second dataset was used to verify the model’s generalization ability on note sections without applying a keyword-based filter.
Model Development and Validation
To develop the model, a hierarchical attention structure based on deep learning was used, previously developed in an earlier work, along with four baseline machine learning algorithms: logistic regression, random forest, support vector machine, and XGBoost.
The previously developed model incorporated a contextually adapted convolutional neural network, allowing it to handle word variations and interpret predictions through attention layers. For more information on the model, it is recommended to consult the article and its supplementary tables.
Model Prediction Interpretation
To interpret the model’s predictions, the words with the highest weight in the attention layers used in the prediction were identified. Words with significant weight, that is, at least 2 standard deviations above the mean, were considered high-attention and compared to the original selected keywords.
On the other hand, for the baseline models, the sections were represented by term frequency, and the algorithms were trained and tested using cross-validation. The results of the research group’s model were then compared with the four aforementioned baseline models.
Metric Comparison
The two metrics used for comparison were AUROC (area under the receiver operating characteristic curve) and AUPRC (area under the precision-recall curve).
AUROC is a commonly used analysis method for these models, as it allows evaluating different thresholds between sensitivity and specificity. AUPRC is another important metric that provides complementary information for unbalanced data, when the percentage of positive cases is low.
What Are the Main Findings of This Clinical Note-Based Machine Learning Model Study?
The main finding of this study is that it is possible to make diagnostic predictions of cognitive decline using a model based on clinical notes. These patients could be in the early stages of cognitive decline, allowing for the identification of early signs in electronic health records.
The model developed for this purpose was the best predictor for detecting patients who will develop cognitive decline, without relying on structured data. Although the deep learning model was the best, the XGBoost model also showed good predictions and is proposed as a simpler alternative in case the necessary technology is not available.
AUROC and AUPRC Metrics
To verify these results, the scores obtained in the AUROC and AUPRC metrics can be seen in datasets 1 and 2 (see tables 1 and 2, respectively). It is especially noteworthy that the deep learning model is the best predictor in both metrics.
In the case of AUROC, all values are above 0.9, with the deep learning model consistently predicting best. As for AUPRC, this is even more evident, as this model is the only one that remains above the 0.9 value.
The differences between these metrics reinforce the consistency of the results, as AUROC shows the relationship between the true positive and false positive rates, while AUPRC reflects the relationship between precision and recall.
In unbalanced samples, the AUROC metric may be less conservative with false positives, so the complementary information from AUPRC helps confirm the model’s strong performance.
Model | AUROC | AUPRC |
Logistic Regression | 0.936 | 0.880 |
Random Forest | 0.950 | 0.889 |
Support Vector Machine | 0.939 | 0.883 |
XGBoost | 0.953 | 0.882 |
Deep Learning | 0.971 | 0.933 |
Model | AUROC | AUPRC |
Logistic Regression | 0.969 | 0.762 |
Random Forest | 0.985 | 0.830 |
Support Vector Machine | 0.954 | 0.723 |
XGBoost | 0.988 | 0.898 |
Deep Learning | 0.997 | 0.929 |
Model Performance
Another point highlighted in this study is that note length could affect model performance; however, as long as sufficient content is maintained, section-based classification can be feasible.
Additionally, this type of model could be applied to other pathologies, though it is important to consider that identifying ambiguous or complex information may prove challenging.
Learn more about
NeuronUP
Try it for free
The platform that 3,500+ professionals use on a daily basis
Where Could NeuronUP Contribute to a Study Like This?
NeuronUP could contribute in various ways to a study like this, given its extensive experience working with large datasets.
As observed in this study, handling large volumes of data is one of the main challenges when working with clinical notes. Therefore, the team at NeuronUP, with specialists in both the clinical field and data analysis, could make valuable contributions to data processing, whether using keywords or not.
Moreover, this study stands out for comparing five different models, which strengthens the results obtained for their model. The experience of the NeuronUP team could also be useful in designing a specific model for this purpose or in creating robust models for comparison with the developed model.
Li Zhou, MD. Professor of Medicine at Harvard Medical School for over ten years and the principal investigator at Brigham and Women’s Hospital. She holds a PhD in Biomedical Informatics from Columbia University, with research focused on natural language processing, knowledge management, and clinical decision support. Additionally, she has served as the principal investigator on numerous research projects funded by AHRQ, NIH, and CRICO/RMF.
References
- Wang L, Laurentiev J, Yang J, et al. Development and Validation of a Deep Learning Model for Earlier Detection of Cognitive Decline From Clinical Notes in Electronic Health Records. JAMA Netw Open. 2021;4(11):e2135174. doi:10.1001/jamanetworkopen.2021.35174
Leave a Reply