« PreviousHomeNext »
Random selection is how you draw the sample of people for your study from a population. Random assignment is how you assign the sample that you draw to different groups or treatments in your study.
It is possible to have both random selection and assignment in a study. Let's say you drew a random sample of 100 clients from a population list of 1000 current clients of your organization. That is random sampling. Now, let's say you randomly assign 50 of these clients to get some new additional treatment and the other 50 to be controls. That's random assignment.
It is also possible to have only one of these (random selection or random assignment) but not the other in a study. For instance, if you do not randomly draw the 100 cases from your list of 1000 but instead just take the first 100 on the list, you do not have random selection. But you could still randomly assign this nonrandom sample to treatment versus control. Or, you could randomly select 100 from your list of 1000 and then nonrandomly (haphazardly) assign them to treatment or control.
And, it's possible to have neither random selection nor random assignment. In a typical nonequivalent groups design in education you might nonrandomly choose two 5th grade classes to be in your study. This is nonrandom selection. Then, you could arbitrarily assign one to get the new educational program and the other to be the control. This is nonrandom (or nonequivalent) assignment.
Random selection is related to sampling. Therefore it is most related to the external validity (or generalizability) of your results. After all, we would randomly sample so that our research participants better represent the larger group from which they're drawn. Random assignment is most related to design. In fact, when we randomly assign participants to treatments we have, by definition, an experimental design. Therefore, random assignment is most related to internal validity. After all, we randomly assign in order to help assure that our treatment groups are similar to each other (i.e., equivalent) prior to the treatment.
« PreviousHomeNext »
Copyright ©2006, William M.K. Trochim, All Rights Reserved
Purchase a printed copy of the Research Methods Knowledge Base
Last Revised: 10/20/2006
My past several posts have detailed confounding variables, a problem you might encounter in research or quality improvement projects.
To recap, confounding variables are correlated predictors. Leaving a confounding variable out of a statistical model can make an included predictor look falsely insignificant or falsely significant. In other words, they can totally flip your statistical analysis results on its head!
To find lurking confounding variables, you must take the time to understand your data and the important variables that may influence a process. Background research and solid subject-area knowledge can help you navigate data difficulties. You should also measure and include everything that you think is important.
Of course, understanding and measuring everything of importance may not be possible due to time and cost constraints. Indeed, all of the relevant variables may not be known or even measurable. What to do?
There is a simple solution to this complex problem. You can wave the white flag and admit that you don’t know everything, or at least that you can’t measure everything that affects your response. You randomize!
Randomness plays several important roles in the design of experiments. In this case, we’re talking about random assignment, which is different than random selection.
- Random selection is how you draw the sample for your study. This allows you to make unbiased inferences about the population based on your sample.
- Random assignment is how you assign the sample to the control and treatment groups in your experiment. This allows you to make causal conclusions about the effect of one variable on another variable.
Random assignment might involve flipping a coin, drawing names out of a hat, or using random numbers. All subjects should have the same probability of being assigned to any group. This process helps assure that the groups are similar to each other when treatment begins. Therefore, any post-study differences between groups shouldn’t be due to prior differences.
Let’s work through an example and see how it combats confounding variables. Take the biomechanics study where we wanted to see if the jumping exercise (treatment group) produced greater bone density than the group that didn’t jump (control group). Further, let’s assume that greater physical activity is correlated with increased bone density but we didn’t measure it. We’ll compare 2 scenarios.
Scenario 1: We don’t use random assignment and, unbeknownst to us, the more physically active subjects end up in the treatment group. The treatment group starts out more active than the control group. Because activity increases bone density, the higher activity in the treatment group may account for the greater bone density compared to the less active control group. Because it is not in the model, activity is a confounding variable that makes the jumping exercise appear to be significant when it might not be.
Scenario 2: We use random assignment so the treatment and control groups start out with roughly equal levels of physical activity. Activity still affects bone density but it is equally spread across the groups. Indeed, the groups are roughly equal in all ways except for the jumping exercise in the treatment group. If the treatment group has a significantly higher bone density, it’s almost certainly due to the jumping exercise.
For both scenarios, the data and statistical results could be identical. However, the results for the second scenario are more valid thanks to the methodology.
Random assignment helps protect you from the perils of confounding variables and competing explanations. However, you can’t always implement random assignment. For the bone density study, we did randomly assign the subjects to the treatment or control group. However, when I used the data from that study to look for patterns amongst the subjects who developed knee pain, I couldn’t randomly assign them to higher and lower calcium intake groups! This highlights one of the pitfalls of ad hoc data analysis.
We’ve detailed the negative aspects of confounding variables here and in my last several posts. However, confounding variables have a potential upside. They don’t sound quite so threatening when you think of them as proxy variables, which we’ll cover in my next post.