Ought to we belief staggered difference-in-differences estimates? – Healthcare Economist

That’s the query posed in paper by Baker, Larcker and Wang (2022). I summarize their key arguments under.

The validity of…[the DiD]…strategy rests on the central assumption that the noticed pattern in management items’ outcomes mimic the pattern in remedy items’ outcomes had they not acquired remedy. Because the authors write:

First, DiD estimates are unbiased in settings with a single remedy interval, even when there are dynamic remedy results. Second, DiD estimates are additionally unbiased in settings with staggered timing of remedy task and homogeneous remedy impact throughout companies and over time. Lastly, when analysis settings mix staggered timing of remedy results and remedy impact heterogeneity, staggered DiD estimates are seemingly biased.

Oftentimes, DiD is applied utilizing an atypical least squares (OLS) regression based mostly mannequin as follows:

When there are greater than two teams and greater than and a pair of time intervals, regression-based DiD fashions sometimes depend on two-way fastened impact (TWFE) of the shape:

The place the primary two coefficients are unit and time interval
fastened results. Be aware that earlier analysis from Goodman-Bacon
reveals that static types of the TWFE DiD is definitely a “weighted
common of all potential two-group/two-period DiD estimators within the information.”

When remedy results can change over time (“dynamic
remedy results”), staggered DiD remedy impact estimates can truly
get hold of the alternative signal of the true ATT, even when the researcher have been capable of
randomize remedy task (thus the place the parallel-trends assumption

The explanation for it is because Goodman-Bacon
reveals that the static TWFE DiD is definitely consists of three elements:

  • Variance-weighted common remedy impact on
    the handled (VWATT)
  • Variance-weighted common counterfactual traits
  • Weighted sum of the change within the common
    remedy on the handled inside a treatment-timing group’s post-period and
    round a later-treated unit’s remedy window (ΔATT)

The primary time period is the time period of curiosity.  If the parallel traits happens, then VWCT =0.  The final time period arises as a result of, beneath static
TWFE DiD, already-treated teams as successfully used as comparability teams for later-treated
teams.  If DiD is estimated in a
two-period mannequin, nonetheless, this time period disappears and there’s no bias. Alternatively,
if remedy results are static (i.e., not altering over time after the
intervention), then ΔATT = 0 and TWFE DiD is legitimate. 

The challenges, nonetheless, happens when remedy results are
dynamic.  On this case ΔATT

0 and the TWFE DiD is biased.

So what will be executed? The authors provide 3 options:

  • Callaway and Santa’Anna (2021). Right here, the authors permit one to estimate remedy impact for a specific group (remedy at time g) utilizing observations at time τ and g-1 from a clear set of controls.  These are principally not-yet handled, last-treated, or never-treated teams. 
  • Sun and Abraham (2021).  An identical methodology is used as in CS, however always-treated items are dropped, and the one items that can be utilized as efficient controls are these which might be never-treated or last-treated. Additional, this strategy is absolutely parametric.
  • Stacked regression estimators. Cengiz (2019) implements this strategy.  The objective is to “create event-specific “clear 2 × 2” datasets, together with the end result variable and controls for the handled cohort and all different observations which might be “clear” controls throughout the remedy window (e.g., not-yet-, last-, or never-treated items). For every clear 2 × 2 dataset, the researcher generates a dataset-specific figuring out variable. These event-specific information units are then stacked collectively, and a TWFE DiD regression is estimated on the stacked dataset, with dataset-specific unit- and time-fixed results… In essence, the stacked regression estimates the DiD from every of the clear 2 × 2 datasets, then applies variance weighting to mix the remedy results throughout cohorts effectively.”

Whereas there was a number of math on this put up, if researchers apply these different DiD estimators, the authors correctly advocate that “researchers ought to justify their alternative of ‘clear’ comparability teams—not-yet handled, final handled, or by no means handled—and articulate why the parallel-trends assumption is more likely to apply”.

You possibly can learn the total article here.