This article is a practical guide to the estimators implemented in
Rfuzzydid. It follows de Chaisemartin and D’Haultfoeuille
(2018) and the Stata companion by de Chaisemartin, D’Haultfoeuille, and
Guyonvarch (2019). Those papers are the primary references for the
formal identification results; this vignette keeps the focus on what to
run and how to read the output.
A fuzzy DID design has the same comparison structure as a standard DID design, but the treatment group does not necessarily move from untreated to fully treated. Instead, the treatment rate increases more in one group than in the comparison group. The first stage is the DID in treatment:
\[ \Delta_D = E[D\mid G=1,T=1] - E[D\mid G=1,T=0] - E[D\mid G=0,T=1] + E[D\mid G=0,T=0]. \]
The estimators in Rfuzzydid scale outcome changes by
this first stage. In the binary-treatment, two-group, two-period case,
the target is a local average treatment effect for switchers: units
whose treatment status changes because their group-period cell receives
the stronger treatment-intensity change.
fuzzydid() exposes the main estimators from the papers
through logical options. You can request more than one estimator in the
same call.
| Option | Estimator | Use when |
|---|---|---|
did = TRUE |
Wald-DID | You want the familiar DID Wald ratio as a benchmark, and the stronger treatment-effect stability restrictions are credible. |
tc = TRUE |
Wald-TC | You want an alternative LATE estimator that uses common-trend restrictions within baseline-treatment subgroups and does not require stable treatment effects. |
cic = TRUE |
Wald-CIC | You want the fuzzy extension of changes-in-changes, using distributional restrictions rather than a mean-only common-trends restriction. |
lqte = TRUE |
LQTE | You want local quantile treatment effects; this uses the same style of restrictions as Wald-CIC. |
The Wald-DID ratio is easy to interpret, but the papers show that it needs extra restrictions on treatment effects to identify the LATE in fuzzy designs. Wald-TC and Wald-CIC are useful alternatives when those restrictions are not plausible. They rely on different identifying assumptions, so disagreement between estimators is substantively informative rather than just a numerical detail.
The Stata companion recommends using equality tests and placebo-style
numerator checks when the design allows them. In Rfuzzydid,
use eqtest = TRUE when at least two LATE estimators are
requested. Use numerator = TRUE for reduced-form placebo
checks in periods where the treatment first stage should be zero.
The formula is outcome ~ treatment + covariates. The
group and time arguments name the group and
period variables.
Internally, fuzzydid() parses the formula into one
numeric outcome, one numeric treatment, and optional covariates. Numeric
covariates are treated as continuous predictors. Factor, character, and
logical covariates are treated as qualitative predictors and expanded to
indicator columns with the first sorted level omitted. With
sieves = TRUE, continuous covariates are expanded to
polynomial basis terms; when sieveorder = NULL, the order
is selected by a deterministic five-fold cross-validation rule.
Rows with NA or NaN in analysis variables
are removed by complete-case filtering. Inf and
-Inf are rejected because they are not meaningful support
points for the empirical means and quantile maps. Use
tagobs = TRUE to return the retained-row mask.
library(Rfuzzydid)
make_cell <- function(g, t, n, p_d) {
d <- rbinom(n, size = 1, prob = p_d)
y <- 1 + 0.4 * g + 0.3 * t + 1.5 * d + rnorm(n, sd = 0.2)
data.frame(y = y, d = d, g = g, t = t)
}
set.seed(4)
df <- rbind(
make_cell(g = 0, t = 0, n = 80, p_d = 0.20),
make_cell(g = 0, t = 1, n = 80, p_d = 0.30),
make_cell(g = 1, t = 0, n = 80, p_d = 0.25),
make_cell(g = 1, t = 1, n = 80, p_d = 0.70)
)
fit <- fuzzydid(
data = df,
formula = y ~ d,
group = "g",
time = "t",
did = TRUE,
tc = TRUE,
cic = TRUE,
eqtest = TRUE,
breps = 50,
seed = 1
)
summary(fit)
#> LATE estimators
#> estimator estimate std.error conf.low conf.high
#> W_DID 1.548061 0.12623028 1.360954 1.826263
#> W_TC 1.539913 0.09079608 1.381874 1.714579
#> W_CIC 1.652100 0.08056021 1.537000 1.832401
#>
#> Equality tests
#> contrast estimate std.error conf.low conf.high
#> W_DID_W_TC 0.00814816 0.04635526 -0.03830102 0.12174418
#> W_DID_W_CIC -0.10403941 0.08036804 -0.27340920 0.06065538
#> W_TC_W_CIC -0.11218757 0.05386572 -0.24436730 -0.04720579The late component contains point estimates, bootstrap
standard errors, and percentile bootstrap confidence intervals. The
matrices component mirrors the matrix-style results
returned by the Stata command.
fit$late
#> estimator estimate std.error conf.low conf.high
#> 1 W_DID 1.548061 0.12623028 1.360954 1.826263
#> 2 W_TC 1.539913 0.09079608 1.381874 1.714579
#> 3 W_CIC 1.652100 0.08056021 1.537000 1.832401
fit$eqtest
#> contrast estimate std.error conf.low conf.high
#> 1 W_DID_W_TC 0.00814816 0.04635526 -0.03830102 0.12174418
#> 2 W_DID_W_CIC -0.10403941 0.08036804 -0.27340920 0.06065538
#> 3 W_TC_W_CIC -0.11218757 0.05386572 -0.24436730 -0.04720579For ordered or multi-valued treatments, newcateg groups
treatment values into ordered bins before estimating Wald-TC or
Wald-CIC:
fuzzydid(
data = df,
formula = wage ~ schooling,
group = "g",
time = "t",
tc = TRUE,
cic = TRUE,
newcateg = c(5, 8, 11, 14, 1000)
)This follows the Stata article’s treatment of applications where the treatment has many support points.
When the comparison group does not have a stable treatment rate, the
Wald-TC point estimand may be unavailable. The papers derive bounds for
that setting, and partial = TRUE requests the corresponding
TC bounds:
When the identifying restrictions are more credible after
conditioning on covariates, include those variables on the right-hand
side of the formula. modelx controls the parametric
first-step models used for Wald-DID and Wald-TC. With binary treatment,
supply two entries: one for outcome conditional expectations and one for
treatment conditional expectations.
fuzzydid(
data = df,
formula = y ~ d + x1 + x2,
group = "g",
time = "t",
did = TRUE,
tc = TRUE,
modelx = c("ols", "logit")
)For continuous controls, sieves = TRUE requests
nonparametric sieve adjustment. If sieveorder = NULL,
Rfuzzydid selects the order by deterministic five-fold
cross-validation.
By default, fuzzydid() computes bootstrap standard
errors and confidence intervals. Use breps to set the
number of replications and seed for reproducibility. Use
cluster for one-way clustered bootstrap inference.
nose = TRUE skips bootstrap inference and returns point
estimates only. This is useful while checking data construction, but
final empirical results should usually report uncertainty.
The returned object is a causal estimand summary, not a predictive
model. Accordingly, it supports estimate extractors such as
coef(), confint(), vcov(),
nobs(), formula(),
generics::tidy(), and generics::glance(), but
it does not define observation-level fitted values, residuals, or
predictions.
Runtime is approximately linear in the number of bootstrap replications. Within each replication, the no-covariate estimators are dominated by group-period subsetting and empirical distribution calculations. Covariate-adjusted estimators additionally fit nuisance OLS/logit/probit models, and sieve estimation grows with the number of generated basis terms.
The core Stata command
maps to:
fuzzydid(
data = df,
formula = y ~ d,
group = "g",
time = "t",
did = TRUE,
tc = TRUE,
cic = TRUE,
breps = 50
)For multiple periods, the Stata command uses a forward group
variable. In Rfuzzydid, pass the backward group through
group and the forward group through
group_forward:
fuzzydid(
data = panel_df,
formula = y ~ d,
group = "G_t",
group_forward = "G_tplus1",
time = "t",
did = TRUE,
tc = TRUE
)See vignette("stata-parity", package = "Rfuzzydid") for
direct R/Stata parity examples, and
vignette("paper-replication", package = "Rfuzzydid") for an
INPRES replication exercise.
Before treating an estimate as a causal effect, check the design:
de Chaisemartin, C. and D’Haultfoeuille, X. (2018). Fuzzy Differences-in-Differences. Review of Economic Studies, 85(2): 999-1028. doi:10.1093/restud/rdx049.
de Chaisemartin, C., D’Haultfoeuille, X., and Guyonvarch, Y. (2019). Fuzzy Differences-in-Differences with Stata. Stata Journal, 19(2): 435-458. doi:10.1177/1536867X19854019.