Design and analysis of stepped wedge cluster randomized trials

https://doi.org/10.1016/j.cct.2006.05.007Get rights and content

Abstract

Cluster randomized trials (CRT) are often used to evaluate therapies or interventions in situations where individual randomization is not possible or not desirable for logistic, financial or ethical reasons. While a significant and rapidly growing body of literature exists on CRTs utilizing a “parallel” design (i.e. I clusters randomized to each treatment), only a few examples of CRTs using crossover designs have been described. In this article we discuss the design and analysis of a particular type of crossover CRT – the stepped wedge – and provide an example of its use.

Introduction

Cluster (or community, or group) randomized trials (CRT) are distinguished by the fact that individuals are randomized in groups rather than individually. CRTs have been used to evaluate antismoking interventions [1], [2], methods of preventing human immunodeficiency virus (HIV) and other sexually transmitted diseases (STDs) [3], [4], and in a number of other contexts [5], [6]. Cluster designs may be chosen because the intervention can only be administered on a community-wide scale (e.g. [7]), or to minimize contamination ([8]), or for other logistic, financial or ethical reasons. From a statistical viewpoint, the key characteristic of CRTs is that the individual units within a cluster are correlated and this feature must be incorporated into power calculations and the trial analysis.

CRTs often employ a parallel design: for a two-arm study with 2I independent clusters, I clusters are randomly assigned to each intervention at a single time point. If the cluster sizes are all equal, a two-sample t-test may be used to compare cluster-level mean responses between the intervention groups. If there are more than 2 treatment arms, a one-way analysis of variance may be used. Sometimes the communities are matched and randomization is done within the matched sets. In that case, a paired analysis (e.g. paired t-test) is used. When cluster sizes vary, individual level analyses using generalized estimating equations [17] or random effects models [16] may be used. Statistical aspects of the design and analysis of parallel CRTs have been widely discussed (e.g. [9], [10]).

In contrast, crossover designs are less commonly used in CRTs (three examples are [6], [11], [12]). A crossover CRT requires fewer clusters than a parallel design but may take twice as long (or longer) to complete (since each cluster receives both the treatment and control interventions). If the intervention requires a lengthy follow up period, then this fact alone might make a crossover design impractical. In a standard crossover design the order of the interventions is randomized for each cluster and a time period (called the “washout” period) is often included between the two interventions so that the first intervention does not affect the second. Analysis of a standard crossover design focuses on within-cluster comparisons using a paired t-test.

A stepped wedge design [13] is a type of crossover design in which different clusters cross over (switch treatments) at different time points. In addition, the clusters cross over in one direction only—typically, from control to intervention. The first time point usually corresponds to a baseline measurement where none of the clusters receive the intervention of interest. At subsequent time points, clusters initiate the intervention of interest and the response to the intervention is measured. More than one cluster may start the intervention at a time point, but the time at which a cluster begins the intervention is randomized. Fig. 1 illustrates the differences between the parallel, traditional crossover and stepped wedge designs.

Although the stepped wedge design extends the length of a randomized trial due to the presence of multiple time intervals, the nature of the design may be beneficial in certain settings. In a parallel or traditional crossover design, the intervention must be implemented in half of the total clusters simultaneously. However, limited resources or geographical constraints may make this logistically impossible (e.g. [13]). The stepped wedge design allows the researcher to implement the intervention in a smaller fraction of the clusters at each time point. Another unique feature of the stepped wedge design is that the crossover is unidirectional. All clusters eventually receive the intervention and, in particular, the intervention is never removed once it has been implemented (at least over the course of the trial) which may alleviate ethical and/or community concerns. This makes the stepped wedge design particularly useful for evaluating the population-level impact of an intervention that has been shown to be effective in an individually randomized trial. The unidirectional aspect of the crossover does, however, complicate the analysis since the treatment effect can no longer be estimated exclusively from within-cluster comparisons. More details on the analysis of such trials are provided below.

In Section 2 we describe a trial being conducted in Washington state that uses a stepped wedge design. This motivating example provides a context for the theoretical and simulation results shown in Section 3 where we describe statistical aspects of the design and analysis of stepped wedge CRTs. In Section 4 we summarize our findings and discuss future areas of research.

Section snippets

Example — partner notification

Partner notification is the process by which sex partners of patients with sexually transmitted infections (STIs) are notified of potential exposure to infection and encouraged to seek treatment. Standard practice for partner notification in most states in the US involves contact of partners by public health authorities. However, the high costs associated with this practice have influenced investigators to seek alternative partner treatment methods. One alternative strategy is patient delivered

Statistical issues

In this section we examine a number of issues related to the design and analysis of stepped wedge CRTs.

Discussion

Using theoretical calculations and simulation we have investigated statistical characteristics of the stepped wedge design for cluster randomized trials. In particular, we have outlined a procedure for computing power in such trials and investigated the effect of varying intercluster correlation, number of randomization steps and treatment delay on trial power. The design is relatively insensitive to variations in the intercluster correlation. We also found that, for a fixed number of clusters,

Acknowlegements

This research was supported by NIH grants AI29168, AI46702.

References (24)

  • R.H. Palmer et al.

    A randomized controlled trial of quality assurance in sixteen ambulatory care practices

    Med Care

    (1985)
  • R. Menzies et al.

    The effect of varying levels of outdoor air supply on the symptoms of sick building syndrome

    N Engl J Med

    (1993)
  • Cited by (0)

    View full text