Machine Learning Publications

Machine learning for comprehensive forecasting of Alzheimer’s Disease progression

Charles K. Fisher, Aaron M. Smith, Jonathan R. Walsh, Coalition Against Major Diseases

Most approaches to machine learning from electronic health data can only predict a single endpoint. The ability to simultaneously simulate dozens of patient characteristics is a crucial step towards personalized medicine for Alzheimer’s Disease. Here, we use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to simulate detailed patient trajectories. We use data comprising 18-month trajectories of 44 clinical variables from 1909 patients with Mild Cognitive Impairment or Alzheimer’s Disease to train a model for personalized forecasting of disease progression. We simulate synthetic patient data including the evolution of each sub-component of cognitive exams, laboratory tests, and their associations with baseline clinical characteristics. Synthetic patient data generated by the CRBM accurately refect the means, standard deviations, and correlations of each variable over time to the extent that synthetic data cannot be distinguished from actual data by a logistic regression. Moreover, our unsupervised model predicts changes in total ADAS-Cog scores with the same accuracy as specifcally trained supervised models, additionally capturing the correlation structure in the components of ADAS-Cog, and identifes sub-components associated with word recall as predictive of progression.

Conference Posters

Synthetic Control Subjects for Alzheimer's Disease Clinical Trials (JSM 2019)

Charles Fisher, Yannick Pouliot, Aaron Smith, Jonathan Walsh

Objective: To develop a method to model disease progression that simulates detailed clinical data records for subjects in the control arms of Alzheimer's disease clinical trials. Methods: We used a robust data processing framework to build a dataset from a database of subjects in the control arms of a diverse set of 28 clinical trials on Alzheimer's disease. From this dataset, we selected 1908 subjects with 18-month trajectories of 44 variables and trained a probabilistic generative model called a Conditional Restricted Boltzmann Machine (CRBM) to simulate disease progression in 3-month intervals across all variables. Results: Based on a statistical analysis comparing data from actual and simulated subjects, the model generates accurate subject-level distributions across variables and through time. Focusing on a common clinical trial endpoint for Alzheimer’s disease (ADAS-Cog), we show the model can accurately predict disease progression and may be used to model the control arm of a clinical trial whose data are distinct from the training and test datasets. Conclusion: The ability to simulate dozens of clinical characteristics simultaneously is a powerful tool to model disease progression. Such models have useful applications for clinical trials, from analyzing control groups to supplementing actual subject data in control arms.

Conference Posters

Synthetic Control Subjects for Alzheimer's Disease Clinical Trials (AAIC 2019)

Charles Fisher, Yannick Pouliot, Aaron Smith, Jonathan Walsh

Objective: To develop a method to model disease progression that simulates detailed clinical data records for subjects in the control arms of Alzheimer's disease clinical trials. Methods: We used a robust data processing framework to build a machine learning dataset from a database of subjects in the control arms of a diverse set of 28 clinical trials on Alzheimer's disease. From this dataset, we selected 1908 subjects with 18-month trajectories of 44 variables and trained a model capable of simulating disease progression in 3-month intervals across all variables. Results: Based on a statistical analysis comparing data from actual and simulated subjects, the model generates accurate subject-level distributions across variables and through time. Focusing on a common clinical trial endpoint for Alzheimer's disease (ADAS-Cog), we show the model can predict disease progression as accurately as several supervised models. Our model also predicts the outcome of a clinical trial whose data are distinct from the training and test datasets. Conclusion: The ability to simulate dozens of clinical characteristics simultaneously is a powerful tool to model disease progression. Such models have useful applications for clinical trials, from analyzing control groups to supplementing real subject data in control arms.

Conference Posters

Generating Synthetic Control Subjects Using Machine Learning for Clinical Trials in Alzheimer's Disease (DIA 2019)

Charles K. Fisher, Yannick Pouliot, Aaron M. Smith, Jonathan R. Walsh

Objective: To develop a method to model disease progression that simulates detailed patient trajectories. To apply this model to subjects in control arms of Alzheimer's disease clinical trials. Methods: We used a robust data processing framework to build a machine learning dataset from a database of subjects in the control arms of a diverse set of 28 different clinical trials on Alzheimer's disease. From this dataset, we selected 1908 subjects with 18-month trajectories of 44 variables and trained 5 cross-validated models capable of simulating disease progression in 3-month intervals across all variables. Results: Based on a statistical analysis comparing data from actual patients with simulated patients, the model generates accurate patient-level distributions across variables and through time. Focusing on a common clinical trial endpoint for Alzheimer’s disease (ADAS-Cog), we show the model can predict disease progression as accurately as several supervised models. Our model also predicts the outcome of a clinical trial whose data are distinct from the training and test datasets. Conclusion: The ability to simulate dozens of patient characteristics simultaneously is a powerful tool to model disease progression. Such models have useful applications for clinical trials, from analyzing control groups to supplementing real subject data in control arms.