The goal of this post is to show that for complex diseases like Alzheimer's, it's difficult—nearly impossible—to match someone exactly with a subject from a historical dataset.
In a standard clinical trial, known as a randomized controlled trial (RCT), recruited subjects are randomly assigned to one of two groups:
However, there are cases when one can’t conduct an RCT. For example, if we’re testing a drug for late-stage cancer or a rare disease, it’s almost impossible to gather enough control subjects—and sometimes unethical not to give them the treatment.
In these special cases, a better course of action is to use control subjects from an external data source  like historical data, which are data collected from similar trials that happened in the past. In this post, we call these subjects “historical control subjects.”
So how do we use historical control subjects? We would recruit actual subjects to the experimental group like in an RCT. But instead of recruiting actual subjects to the control group, we would match the experimental subjects with historical control subjects from a dataset (fig. 2).
Having good matches is important because it ensures the control and experimental groups are as similar as possible and lets us safely compare them. To determine matches, we would use specific baseline covariates that impact the endpoints we want to measure in a clinical trial. Covariates can include variables such as age, region, and laboratory test results. Baseline is defined as the value(s) we measure prior to any intervention or change.
Ideally, we would match each experimental subject with a historical control subject in a clinical trial; this means that all of the relevant baseline covariates are the same between the two subjects . But when there are many covariates—as is the case for a complex disease like Alzheimer’s disease (AD)—finding a match becomes tricky.
In fact, if we want to find someone who matches a subject in a historical dataset, we actually can’t find an exact match, per our definition.
Let’s start with a simple exercise to illustrate this point. Since we’re interested in AD, we’ll use the CODR-AD dataset from the Critical Path Institute [3,4], containing the following information:
-6955 subjects drawn from 28 AD clinical trials
-Measurements for many different variables (demographic, clinical, and cognitive)
-Subject progression over 3-30 months (however, for most subjects, the average progression recorded is 6-18 months)
One important part of the dataset that we’ll keep referring to is ADAS-Cog. It stands for “Alzheimer’s Disease Assessment Scale–Cognitive Subscale,” the gold standard for assessing cognitive progression in AD clinical trials. When subjects take the ADAS-Cog test, they answer a series of questions that are grouped into 11 components, which measure different features of cognitive dysfunction. Adding scores for each component leads to an overall score. The higher the overall score, the worse the dysfunction .
To start the exercise, let’s select one random subject from the CODR-AD dataset with the following baseline characteristics (fig. 3):
Now let’s find subjects that have the same ADAS-Cog scores for each component (fig. 4). The order we use to filter the components is based on this study : the higher the component (starting from orientation), the more precisely it determines cognitive dysfunction.
Here, we see that this subject has zero matches when we filter using ADAS-Cog components.
However, if we look carefully at the entire dataset, it is possible to get an exact match on all 11 ADAS-Cog components. About 19% of subjects in the dataset have matches, so let’s look at an example of such a subject.
We’ll do the exercise again using a new random subject with these characteristics (fig. 5):
To find the best match(es), let’s filter again using all 11 ADAS-Cog components (fig. 6):
Although we’ve found one match based on all of the ADAS-Cog components, the profile of the matching subject is very different: 73 years old, white, and male. So when we take all relevant covariates into account—including age, race, and sex—we don’t actually get an exact match for our random female subject.
Having gone through this exercise, we can now see that we generally can’t match subjects exactly. The broader takeaway? Finding matches based on a large number of variables is difficult, often impossible.
In the next post, we’ll reiterate this takeaway but show a different way to think about the same matching problem.
*Written in collaboration with Jon Walsh
1. External data source can be other types of data, such as patient registries and real-world data.
2. A good medical analogy for matching in clinical trials is twin studies. In twin studies, two individuals have the same genomic information. When matching for clinical trials, two subjects should have the same covariates that are relevant to what we want to measure in a particular clinical trial.
3. Romero, K. et al. The Coalition Against Major Diseases: developing tools for an integrated drug development process for Alzheimer’s and Parkinson’s diseases. Clin. Pharm. & Ther. 86, 365–367 (2009).
4. Neville, J. et al. Development of a unified clinical trial database for Alzheimer’s disease. Alzheimer’s & Dementia: J Alz. Assn. 11, 1212–1221 (2015).
5. Rosen WG, Mohs RC, Davis KL. A new rating scale for Alzheimer's disease. Am J Psychiatry. 141(11):1356-1364 (1984).
6. Ueckert, S., Plan, E.L., Ito, K. et al. Improved Utilization of ADAS-Cog Assessment Data Through Item Response Theory Based Pharmacometric Modeling. Pharm Res. 31(8): 2152-2165 (2014).