“Testing for Idiosyncratic Treatment Effect Heterogeneity." (Link here)
Abstract: This paper provides asymptotically valid tests for the null hypothesis of no treatment effect heterogeneity. Importantly, I consider the presence of heterogeneity that is not explained by observed characteristics, or so-called idiosyncratic heterogeneity. When examining this heterogeneity, common statistical tests encounter a nuisance parameter problem in the average treatment effect which renders the asymptotic distribution of the test statistic dependent on that parameter. I propose an asymptotically valid test that circumvents the estimation of that parameter using the empirical characteristic function. A simulation study illustrates not only the test’s validity but its higher power in rejecting a false null as compared to current tests. Furthermore, I show the method’s usefulness through its application to a microfinance experiment in Bosnia and Herzegovina. In this experiment and for outcomes related to loan take-up and self-employment, the tests suggest that treatment effect heterogeneity does not seem to be completely accounted for by baseline characteristics. For those outcomes, researchers could potentially try to collect more baseline characteristics to inspect the remaining treatment effect heterogeneity, and potentially, improve treatment targeting.
“At What Level Should One Cluster Standard Errors in Paired and Small-Strata Experiments?" (with Clément de Chaisemartin). Revision requested by the American Economic Journal: Applied Economics. (Link here)
Abstract: In paired experiments, units are matched into pairs, and one unit of each pair is randomly assigned to treatment. To estimate the treatment effect, researchers often regress their outcome on a treatment indicator and pair fixed effects, clustering standard errors at the unit-of-randomization level. We show that the variance estimator in this regression may be severely downward biased: under constant treatment effect, its expectation equals 1/2 of the true variance. Instead, we show that researchers should cluster their standard errors at the pair level. Using simulations, we show that those results extend to stratified experiments with few units per strata.