Determining the necessary sample size is one of th emost common issues encountered by healthcare professionals involved in healthcare reearch or qulaity improvement; but it is also one of the least understood. Coversations between healthcare professionals and statisticians often go something like this:

Healthcare Professional: “I want to be sure I have a big enough sample.”

Statistician: “To do what?”

HP: (long pause) “Well…to show that there’s an improvement.”

S: “How much of an improvement?”

HP: (sigh) “I don’t know, 5%, 10%? Something along those lines.”

S: “OK, how certain to you want to be that you’ll be able to detect that change?”

HP: (another long pause) “What? What do you mean?

And it devolves from there.

Talking about sample size is difficult for a couple of reasons: one, it requires that you know something about how sample sizes are determined (which is a big ask if you haven’t had any formal training in it) and, two, it often requires that you have a pretty good idea of what outcome you’re hoping to get before you even begin.

There are lots of resources out there explaining the basics of the statistical theories behind sample size calculations, but they are often overly technical and frequently focus on one type of situation (which may not be applicable to you). Identifying resources that are applicable to your situation require that you have a basic understanding of what it is you’re looking for.

Let’s back up and talk in general terms about what goes into determining the appropriate sample size. The first thing we need to get clear on is what, specifically, you are trying to show, detect, demonstrate, or test. Are you trying to determine which hospital discharge process is the most effective at relaying relevant information to patients being discharged? Are you trying to demonstrate an improvement in the rate of hospital-acquired infections from the previous month? Are you trying to estimate the percent of your patients who would indicate that they were “very satisfied” with their last clinic visit?  The more specific you can be, the better.

Once that is established, there are three things that determine how big your sample needs to be:

1. How big of an effect, change, or difference you want to be able to detect. This where the clinical (or practical) significance comes into play: what level of improvement (e.g., how much of a reduction in infection rate) is clinically meaningful? How much of a difference in quality measure performance would you be comfortable saying represents meaningful improvement? Five percent? Ten? The smaller the difference you want to be able to detect, the more data you’ll need.

2. The level of variability in your data; the more variable, the more you’ll need. In the simplest scenario, you’re going to be comparing two things (the average door-to-balloon times in November to those in December; the median survival time for patients on Drug A vs Drug B; etc.). On the surface, a reduction in the door-to-balloon time (for example) of 55 to 50 minutes might seem like an improvement. But if the individual times that make up those averages range from 30 to 70 minutes, then there might just inherently be a lot of variability and it will be tougher to demonstrate that a 5-minute reduction in the average time is statistically significant. Conversely, if the individual times all range from 54 to 56 minutes prior the intervention and from 49 to 51 minutes afterwards, you’re probably not going to need as much data to demonstrate that a change occurred.

3. How sure you want to be that you’re correct. This is what statisticians refer to as the “power” of the test: the probability it will correctly detect a difference/change/effect when there actually is one. It’s common to set this to 80% to 90%. But, this is a reminder that everything we’re doing here is based on probabilities, which means that it’s always possible that we’ll be WRONG. Specifically, sometimes the randomness of data results in unexpected findings, which we interpret as evidence of a change. The more sure you want to be about being correct, the more data you will need.

While reading those, you may have asked yourself several questions, like, “What if I don’t know what amount of a change or improvement to expect?” or “How am I supposed to know how variable my data will be before I collect it?”

Those are very reasonable questions, and they get at the heart of what can make sample size calculations so frustrating. If you don’t have any prior knowledge or experience with what you’re about to attempt, it can be difficult to know the answer to those questions. You might be able to find information from other sources (e.g., published literature, preliminary data, etc.), but other times you’ll just have to use your best judgement regarding what’s a reasonable assumption.

So, before you talk to a statistician, here’s a checklist of what you’ll need (at a minimum):

– What you are trying to show, detect, determine, or test, and what data you plan to use to do that (get specific: will it come from chart abstraction? If so, is it a check-box or a text field? Or maybe it will come from a patient survey; what will be asked, specifically?)

 – How big of an effect, change, or difference you want to be able to detect (have a couple of options or a range: here’s what would be ideal, but here’s what we could live with)

 – Any information you have regarding how variable the data might be (preliminary data, other studies or interventions that used similar data, etc.)

 – The level of power you are ok with (probably 80% to 90%)

Now, here’s what a statistician will think about when trying to determine the sample size:

  • What’s the random variable, and what type of variable is it (continuous, discrete, mean, proportion, correlation, etc.)?
  • What type of analysis(es) would we perform to show, detect, determine, or test (t-test, chi-square, regression, ANOVA, logistic regression, control chart, etc.)?
  • How will the data be collected (random sample, convenience sample, cluster sample, other)?
  • Are there potential complicating factors (potential for missing data, inherent bias, unmeasured confounders, etc.)?
  • What assumptions need to be made, and to which ones is the sample size more sensitive?

A good statistician will work WITH you; this will be an iterative process where he or she will ask you some more questions, some of which you may not know the answer to, or you may have to check and get back to them. Therefore, this is not a last-minute process! Allow plenty of time for the back-and-forth that is required.

Except for in the most straight-forward cases, the statistician will probably run multiple scenarios for you, because there are multiple assumptions that are being made. Making more conservative assumptions (more variability than expected, higher rate of missing data, smaller difference/effect) will result in higher sample sizes. Ask the statistician which assumptions are driving the sample size, because often if you can compromise on things like effect size or power, you can reduce the necessary sample size.