Across a growing number of diseases and in healthcare as a whole, machine learning algorithms have begun to make an impact on diagnostic and patient outcome measurements. In working with the Accelerated Cure Project for Multiple Sclerosis, a non-profit organization whose mission is to accelerate research efforts to improve diagnosis and treatment for multiple sclerosis (MS), our goal was to develop a set of machine learning models to predict MS disease evolution and determine crucial transitions in MS progression.
Multiple sclerosis (MS) is a chronic, autoimmune disease affecting the central nervous system with a wide range of symptoms, causing weakness, pain, and impairing movement, speech, and visual functions. With the cause of MS still unknown, and with approximately 2.5 million cases worldwide, clinical professionals and researchers focus on effective treatments as well as tools that help predict the severity and specific symptoms of patients. For most people, MS begins with a relapsing-remitting (RR) course, in which episodes of worsening function (relapses) are followed by complete or partial recovery periods (remissions). In some patients, disability may progress independent of relapses, a process termed secondary progressive multiple sclerosis (SPMS). If left untreated, many patients will transition to the secondary-progressive (SP) category, identified for its progressive damage and disability over time. Although some people with MS experience little disability during their lifetime, the disease gradually worsens over time, with up to 60% are no longer fully ambulatory 20 years after onset.
The Accelerated Cure Project for Multiple Sclerosis (ACP) is a patient-founded national non-profit organization dedicated to accelerating advances toward a cure for MS. ACP’s iConquerMS is a ground-breaking initiative that empowers people with MS to help drive research and accelerate efforts towards improving outcomes. The ACP’s level of activity generates huge volumes of information that can be analyzed for greater clinical insight and diagnostic advantage. “Predictive information about disease progression is critical to making personalized treatment for MS a reality and enabling people to live their best lives,” says Robert McBurney, Chief Research Officer. A key disease evolution point includes the transition from relapse-remitting MS (RRMS) to secondary-progressive MS (SPMS), which suggests that patients progressed beyond remission with implications for their future quality of life. The ability to predict this period from purely survey data is a valuable tool with high clinical relevance, as there are no clinical tests to identify this transition. SFL Scientific worked to analyze the iConquerMS dataset and to develop a predictive machine learning model for this crucial transition period.
Working with ACP, SFL Scientific examined previous research on predicting multiple sclerosis transition periods. There was evidence that MS stage evolution can be predicted using a statistical model (Fiorini et al., 2016), however, in using a linear approach, the authors demonstrated that the MS stage can be predicted up to two years from when data was collected. Their use of data from rehabilitation facilities may also mean less generalizability to MS patients in the wider populations, especially young adults, with actual model performance unclear for imbalanced datasets. To demonstrate the power of machine learning in developing diagnostic and predictive health tools, our goal was to analyze recent community-based population data using modeling approaches to predict MS stage evolution up to 5 years from the first measurement.
This project used data from 2311 participants from the iConquerMS cohort that met criteria including MS diagnosis remaining at RRMS, SPMS, or transitioning to the latter stage during the timeframe of the study. Information was collected from participants involved 9 seasons of data collection in six month periods from 2016 to mid-2020. SFL Scientific’s approach was to determine whether multivariate predictors could be found in the dataset for a transition in the six month period or subsequent change scores. The design and current dataset posed several challenges, including lower sample size for longitudinal data, survey introduction timing and participation timing in data collection, and the lower granularity afforded by interval-based data. To overcome these issues, a machine learning pipeline that incorporates interval timing and combines data across seasons was introduced. This resulted in a model that could flexibly predict MS stage not only for the current time of the data input but also for a future interval time.
Enabling ML-Based Diagnostic Tools
Accuracy in healthcare and clinical diagnostics is always the number-one criterion for evaluating performance. Due to high data imbalance, it is important to evaluate models across several metrics, including ROC AUC, average precision, average recall, F1 scores, and other statistical measures indicating how models are capable of distinguishing between classes. Generally, with higher AUC, the model is better at distinguishing between patients with disease and no disease, or MS stages.
To achieve project goals, SFL Scientific developed a machine learning pipeline incorporating two novel strategies for ACP: (1) Backfilling was implemented to augment missing feature data, by filling in information when available from previous seasons. While this ignores possible variations that may have occurred between the time of the season being backfilled and the last time information was available, it helps utilize large amounts of data that would have been unused otherwise, and took care of missing data resulting from the non-overlaps in data collection between data modules; and (2) To build a classification model that could not only predict MS stage for the same season in which features are collected from the same season, but also predict MS stage for a future season, temporal features were expanded for available seasons for each participant to achieve pairings of available intervals between the seasons when the feature and target variable is measured. This strategy improves the richness of the dataset that the model can learn from, and maximally utilizes the data by including participants both with and without longitudinal data and/or Modeling was performed using the XGBoost estimator and was optimized for different subgroups. The final model achieved an ROC AUC of 0.89, and a macro average precision, recall, and F1 score of 0.79, 0.78, and 0.79, respectively. Subset validation was performed as well to test and evaluate the performance and to understand how each model’s predictions for SPMS changed with varying attributes and patient data.
AI Accelerates Healthcare Research
In this project, several novel strategies were used to overcome data limitations including low time granularity, data imbalance, lack of longitudinal data, and misalignment in data availability between different modules. Identifying the salient input data segments and reaching a clinically meaningful conclusion from raw data in an accurate and timely manner is therefore challenging physicians and machines. The results showed good overall performance for predicting MS stage evolution up to 5 years from currently available information. Further, the model revealed feature importances that are aligned with clinically important features with the top indicators being participant age, interval period, and mobility measures. While some subsets of the data perform less well than others, ACP believed they still achieved reasonable performance and are expected to improve with additional data availability. The current work suggests that interval-based survey data can viably predict MS stage evolution between RRMS and SPMS.
Along with these findings, we gave a few more recommendations:
- This approach can be extended to other MS types in the future, including but not limited to PPMS.
- There are additional datasets that are currently unexplored, including information related to relapse and DMT. Given that they likely provide additional perspective on MS progression in patients, it is probable that they will further improve the predictive performance.
- Researchers have suggested clinical utility in moving beyond the four currently-recognized MS stages. To this end, clustering-based approaches can explore new ways of organizing MS patients. Developing protocols and key information gathering in clinical studies with machine learning in mind will aid in future data analysis efforts and increase the power of such studies.
A comprehensive paper is in progress and will be submitted for review in Q4 of 2020. Our modeling results indicate that these digital biomarkers derived from survey data could in the future be used as additional diagnostic criteria for MS and the potential to support experts in the treatment process. Further, new technologies, such a monitoring in free-living conditions, could potentially aid in objectively assessing the symptoms of MS by quantifying symptom presence and intensity over long periods of time. With the addition of automated medical imaging and IoT-devices, machine learning and AI-based diagnostic tools can support tracking symptoms and improve clinical decision-making by providing high-resolution data collection and analysis outside of a restricted clinical environment.