Title: | Regression Discontinuity Design Application |
---|---|
Description: | Estimation of both single- and multiple-assignment Regression Discontinuity Designs (RDDs). Provides both parametric (global) and non-parametric (local) estimation choices for both sharp and fuzzy designs, along with power analysis and assumption checks. Introductions to the underlying logic and analysis of RDDs are in Thistlethwaite, D. L., Campbell, D. T. (1960) <doi:10.1037/h0044319> and Lee, D. S., Lemieux, T. (2010) <doi:10.1257/jel.48.2.281>. |
Authors: | Ze Jin [aut], Wang Liao [aut], Irena Papst [aut], Wenyu Zhang [aut], Kimberly Hochstedler [aut], Felix Thoemmes [aut, cre] |
Maintainer: | Felix Thoemmes <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3.2 |
Built: | 2025-01-20 04:42:54 UTC |
Source: | https://github.com/felixthoemmes/rddapp |
rddapp: A package for regression discontinuity designs (RDDs).
The rddapp package provides a set of functions for the analysis of the regression-discontinuity design (RDD). The three main parts are: estimation of effects of interest, power analysis, and assumption checks.
A variety of designs can be estimated in various ways. The single-assignment RDD (both sharp and fuzzy) can be analyzed using both a parametric (global) or non-parametric (local) approach. The multiple-assignment RDD (both sharp and fuzzy) can be analyzed using both parametric and non-parametric estimation. The analysis choices are further to use estimate effects based on univariate scaling, the centering approach, or the frontier approach. The frontier approach can currently only be estimated using parametric regression with bootstrapped standard errors.
Statistical power can be be estimated for both the single- and multiple-assignment RDD, (both sharp and fuzzy), including all parametric and non-parametric estimators mentioned in the estimation section. All power analyses are based on a simulation approach, which means that the user has to provide all necessary parameters for a data-generating model.
An important part of any RDD are checks of underlying assumptions. The package provides users with the option to estimate McCrary's sorting test (to identify violations of assignment rules), checks of discontinuities of other baseline covariates, along with sensitivity checks of the chosen bandwidth parameter for non-parametric models, and so-called placebo tests, that examine the treatment effect at other cut-points along the assignment variable.
Ze Jin [email protected], Wang Liao [email protected], Irena Papst [email protected], Wenyu Zhang [email protected], Kimberly Hochstedler [email protected], Felix Thoemmes, [email protected]
attr_check
reports missing data on treatment variable, assignment variable, and outcome.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::attr_check().
attr_check(x1, y, t, x2 = NULL)
attr_check(x1, y, t, x2 = NULL)
x1 |
A numeric object containing the assignment variable. |
y |
A numeric object containing the outcome variable, with the same dimensionality
as |
t |
A numeric object containing the treatment variable (coded as 0 for untreated and 1 for treated), with the same dimensionality
as |
x2 |
A numeric object containing the secondary assignment variable. |
attr_check
returns a list containing the amount and percentage of missing data for all variables and subgroups, by treatment.
bw_ik09
calculates the Imbens-Kalyanaraman (2009) optimal bandwidth
for local linear regression in regression discontinuity designs.
It is based on the IKbandwidth
function in the "rdd" package.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::bw_ik09().
bw_ik09(X, Y, cutpoint = NULL, verbose = FALSE, kernel = "triangular")
bw_ik09(X, Y, cutpoint = NULL, verbose = FALSE, kernel = "triangular")
X |
A numeric vector containing the running variable. |
Y |
A numeric vector containing the outcome variable. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
verbose |
A logical value indicating whether to print more information to the terminal.
The default is |
kernel |
A string indicating which kernel to use. Options are |
ik_bw09
returns a numeric value specifying the optimal bandwidth.
Imbens, G., Kalyanaraman, K. (2009). Optimal bandwidth choice for the regression discontinuity estimator (Working Paper No. 14726). National Bureau of Economic Research. https://www.nber.org/papers/w14726.
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
bw_ik12
calculates the Imbens-Kalyanaraman (2012) optimal bandwidth
for local linear regression in regression discontinuity designs.
It is based on a function in the "rddtools" package.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::bw_ik12().
bw_ik12(X, Y, cutpoint = NULL, verbose = FALSE, kernel = "triangular")
bw_ik12(X, Y, cutpoint = NULL, verbose = FALSE, kernel = "triangular")
X |
A numeric vector containing the running variable. |
Y |
A numeric vector containing the outcome variable. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
verbose |
A logical value indicating whether to print more information to the terminal.
The default is |
kernel |
A string indicating which kernel to use. Options are |
ik_bw12
returns a numeric value specifying the optimal bandwidth.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Stigler, M. and B. Quast, B (2016). rddtools: A toolbox for regression discontinuity in R.
A dataset containing a subset of children from the CARE trial on early childhood intervention. The randomized controlled trial was subsetted to mimic a regression-discontinuity design in which treatment was assigned only to mothers whose IQ was smaller than 85.
CARE
CARE
A data frame with 81 rows and 5 variables:
Unique ID variable
Day Care (Preschool) Treatment Group, 1 = Treatment, 0 = Control
APGAR ("Appearance, Pulse, Grimace, Activity, and Respiration") score at 5 minutes after birth
Biological mother's WAIS (Wechsler Adult Intelligence Scale) full-scale score at subject's birth
Subject's Stanford Binet IQ score at 48 months
http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/4091
data("CARE") head(CARE)
data("CARE") head(CARE)
dc_test
implements the McCrary (2008) sorting test to identify violations of assignment rules.
It is based on the DCdensity
function in the "rdd" package.
dc_test( runvar, cutpoint, bin = NULL, bw = NULL, verbose = TRUE, plot = TRUE, ext.out = FALSE, htest = FALSE, level = 0.95, digits = max(3, getOption("digits") - 3), timeout = 30 )
dc_test( runvar, cutpoint, bin = NULL, bw = NULL, verbose = TRUE, plot = TRUE, ext.out = FALSE, htest = FALSE, level = 0.95, digits = max(3, getOption("digits") - 3), timeout = 30 )
runvar |
A numeric vector containing the running variable. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
bin |
A numeric value containing the binwidth. The default is |
bw |
A numeric value containing bandwidth to use. If no bandwidth is supplied, the default uses bandwidth selection calculation from McCrary (2008). |
verbose |
A logical value indicating whether to print diagnostic information to
the terminal. The default is |
plot |
A logical value indicating whether to plot the histogram and density estimations
The default is |
ext.out |
A logical value indicating whether to return extended output.
The default is |
htest |
A logical value indicating whether to return an |
level |
A numerical value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display in all output.
The default is |
timeout |
A non-negative numerical value specifying the maximum number of seconds that
expressions in the function are allowed to run. The default is 30. Specify |
If ext.out
is FALSE
, dc_test
returns a numeric value specifying the p-value of the McCrary (2008) sorting test.
Additional output is enabled when ext.out
is TRUE
.
In this case, dc_test
returns a list with the following elements:
theta |
The estimated log difference in heights of the density curve at the cutpoint. |
se |
The standard error of |
z |
The z statistic of the test. |
p |
The p-value of the test. A p-value below the significance threshold indicates that the user can reject the null hypothesis of no sorting. |
binsize |
The calculated size of bins for the test. |
bw |
The calculated bandwidth for the test. |
cutpoint |
The cutpoint used. |
data |
A dataframe for the binning of the histogram. Columns are |
McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2), 698-714. doi:10.1016/j.jeconom.2007.05.005.
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
set.seed(12345) # No discontinuity x <- runif(1000, -1, 1) dc_test(x, 0) # Discontinuity x <- runif(1000, -1, 1) x <- x + 2 * (runif(1000, -1, 1) > 0 & x < 0) dc_test(x, 0)
set.seed(12345) # No discontinuity x <- runif(1000, -1, 1) dc_test(x, 0) # Discontinuity x <- runif(1000, -1, 1) x <- x + 2 * (runif(1000, -1, 1) > 0 & x < 0) dc_test(x, 0)
mfrd_est
implements the frontier approach for multivariate regression discontinuity estimation in Wong, Steiner and Cook (2013).
It is based on the MFRDD code in Stata from Wong, Steiner, and Cook (2013).
mfrd_est( y, x1, x2, c1, c2, t.design = NULL, local = 0.15, front.bw = NA, m = 10, k = 5, kernel = "triangular", ngrid = 250, margin = 0.03, boot = NULL, cluster = NULL, stop.on.error = TRUE )
mfrd_est( y, x1, x2, c1, c2, t.design = NULL, local = 0.15, front.bw = NA, m = 10, k = 5, kernel = "triangular", ngrid = 250, margin = 0.03, boot = NULL, cluster = NULL, stop.on.error = TRUE )
y |
A numeric object containing outcome variable. |
x1 |
A numeric object containing the first assignment variable. |
x2 |
A numeric object containing the second assignment variable. |
c1 |
A numeric value containing the cutpoint at which assignment to the treatment is determined for |
c2 |
A numeric value containing the cutpoint at which assignment to the treatment is determined for |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
local |
A non-negative numeric value specifying the range of neighboring points around the cutoff on the standardized scale, for each assignment variable. The default is 0.15. |
front.bw |
A non-negative numeric vector of length 3 specifying the bandwidths at which to estimate the RD for each
of three effects models (complete model, heterogeneous treatment model, and treatment only model)
detailed in Wong, Steiner, and Cook (2013).
If |
m |
A non-negative integer specifying the number of uniformly-at-random samples to draw as search candidates for |
k |
A non-negative integer specifying the number of folds for cross-validation to determine |
kernel |
A string indicating which kernel to use. Options are |
ngrid |
A non-negative integer specifying the number of non-zero grid points on each assignment variable, which is also the number of zero grid points on each assignment variable. The default is 250. The value used in Wong, Steiner and Cook (2013) is 2500, which may cause long computational time. |
margin |
A non-negative numeric value specifying the range of grid points beyond the minimum and maximum of sample points on each assignment variable. This grid is used to impute potential outcomes along the frontier, as in Wong, Steiner, and Cook (2013). The default is 0.03. |
boot |
An optional non-negative integer specifying the number of bootstrap samples to obtain standard error of estimates. |
cluster |
An optional vector of length n specifying clusters within which the errors are assumed to be correlated. This will result in reporting cluster robust SEs. It is suggested that data with a discrete running variable be clustered by each unique value of the running variable (Lee and Card, 2008). |
stop.on.error |
A logical value indicating whether to remove bootstraps which cause error in the |
mfrd_est
returns an object of class "mfrd
".
The functions summary
and plot
are used to obtain and print a summary and
plot of the estimated regression discontinuity. The object of class mfrd
is a list
containing the following components:
w |
Numeric vector specifying the weight of frontier 1 and frontier 2, respectively. |
est |
Numeric matrix of the estimate of the discontinuity in the outcome under a complete model (no prefix), heterogeneous treatment (ht) effects model, and treatment (t) only model, for the parametric case and for each corresponding bandwidth. Estimates with suffix "ev1" and "ev2" correspond to expected values for each frontier, under a given model. Estimates with suffix "ate" correspond to average treatment effects across both frontiers, under a given model. |
d |
Numeric matrix of the effect size (Cohen's d) for estimate. |
se |
Numeric matrix of the standard error for each corresponding bandwidth, if applicable. |
m_s |
A list containing estimates for the complete model, under parametric
and non-parametric (optimal, half, and double bandwidth) cases. A list of
coefficient estimates, residuals, effects, weights (in the non-parametric case),
|
m_h |
A list containing estimates for the heterogeneous treatments model, under parametric
and non-parametric (optimal, half, and double bandwidth) cases. A list of
coefficient estimates, residuals, effects, weights (in the non-parametric case),
|
m_t |
A list containing estimates for the treatment only model, under parametric
and non-parametric (optimal, half, and double bandwidth) cases. A list of
coefficient estimates, residuals, effects, weights (in the non-parametric case),
|
dat_h |
A list containing four data frames, one for each case: parametric or non-parametric (optimal, half, and double bandwidth). Each data frame contains functions and densities for each frontier and treatment model. |
dat |
A data frame containing the outcome ( |
obs |
List of the number of observations used in each model. |
impute |
A logical value indicating whether multiple imputation is used or not. |
call |
The matched call. |
front.bw |
Numeric vector of each bandwidth used to estimate the density at the frontier for the three effects models (complete model, heterogeneous treatment model, and treatment only model) detailed in Wong, Steiner, and Cook (2013). |
Wong, V., Steiner, P, and Cook, T. (2013). Analyzing regression discontinuity designs with multiple assignment variables: A comparative study of four estimation methods. Journal of Educational and Behavioral Statistics, 38(2), 107-141. doi:10.3102/1076998611432172.
Lee, D. and Card, D. (2008). A Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
set.seed(12345) x1 <- runif(1000, -1, 1) x2 <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000) mfrd_est(y = y, x1 = x1, x2 = x2, c1 = 0, c2 = 0, t.design = c("geq", "geq"))
set.seed(12345) x1 <- runif(1000, -1, 1) x2 <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000) mfrd_est(y = y, x1 = x1, x2 = x2, c1 = 0, c2 = 0, t.design = c("geq", "geq"))
mrd_est
estimates treatment effects in a multivariate regression discontinuity design (MRDD) with two assignment variables,
including the frontier average treatment effect (tau_MRD
)
and frontier-specific effects (tau_R
and tau_M
) simultaneously.
mrd_est( formula, data, subset = NULL, cutpoint = NULL, bw = NULL, front.bw = NA, m = 10, k = 5, kernel = "triangular", se.type = "HC1", cluster = NULL, verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE, local = 0.15, ngrid = 250, margin = 0.03, boot = NULL, method = c("center", "univ", "front"), t.design = NULL, stop.on.error = TRUE )
mrd_est( formula, data, subset = NULL, cutpoint = NULL, bw = NULL, front.bw = NA, m = 10, k = 5, kernel = "triangular", se.type = "HC1", cluster = NULL, verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE, local = 0.15, ngrid = 250, margin = 0.03, boot = NULL, method = c("center", "univ", "front"), t.design = NULL, stop.on.error = TRUE )
formula |
The formula of the MRDD; a symbolic description of the model to
be fitted. This is supplied in the
format of |
data |
An optional data frame containing the variables in the model. If not found in |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
cutpoint |
A numeric vector of length 2 containing the cutpoints at which assignment to the treatment is determined. The default is c(0, 0). |
bw |
A vector specifying the bandwidths at which to estimate the RD for non-parametric models.
Possible values are |
front.bw |
A non-negative numeric vector of length 3 specifying the bandwidths at which to estimate the RD for each
of three effects models (complete model, heterogeneous treatment model, and treatment only model)
detailed in Wong, Steiner, and Cook (2013). If |
m |
A non-negative integer specifying the number of uniformly-at-random samples to draw as search candidates for |
k |
A non-negative integer specifying the number of folds for cross-validation to determine |
kernel |
A string indicating which kernel to use. Options are |
se.type |
This specifies the robust standard error calculation method to use,
from the "sandwich" package. Options are,
as in |
cluster |
An optional vector of length n specifying clusters within which the errors are assumed
to be correlated. This will result in reporting cluster robust SEs. This option overrides
anything specified in |
verbose |
A logical value indicating whether to print additional information to
the terminal, including results of instrumental variable regression,
and outputs from background regression models. The default is |
less |
Logical. If |
est.cov |
Logical. If |
est.itt |
Logical. If |
local |
A non-negative numeric value specifying the range of neighboring points around the cutoff on the standardized scale, for each assignment variable. The default is 0.15. |
ngrid |
A non-negative integer specifying the number of non-zero grid points on each assignment variable, which is also the number of zero grid points on each assignment variable. The default is 250. The value used in Wong, Steiner and Cook (2013) is 2500, which may cause long computational time. |
margin |
A non-negative numeric value specifying the range of grid points beyond the minimum and maximum of sample points on each assignment variable. The default is 0.03. |
boot |
An optional non-negative integer specifying the number of bootstrap samples to obtain standard error of estimates.
This argument is not optional if method is |
method |
A string specifying the method to estimate the RD effect. Options are |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
stop.on.error |
A logical value indicating whether to remove bootstraps which cause error in the |
mrd_est
returns an object of class "mrd
".
The function summary
is used to obtain and print a summary of the
estimated regression discontinuity. The object of class mrd
is a list
containing the following components for each estimated treatment effect,
tau_MRD
or tau_R
and tau_M
:
type |
A string denoting either |
call |
The matched call. |
est |
Numeric vector of the estimate of the discontinuity in the outcome under a sharp MRDD or the Wald estimator in the fuzzy MRDD, for each corresponding bandwidth, if applicable. |
se |
Numeric vector of the standard error for each corresponding bandwidth, if applicable. |
ci |
The matrix of the 95 for each corresponding bandwidth, if applicable. |
bw |
Numeric vector of each bandwidth used in estimation. |
z |
Numeric vector of the z statistic for each corresponding bandwidth, if applicable. |
p |
Numeric vector of the p-value for each corresponding bandwidth, if applicable. |
obs |
Vector of the number of observations within the corresponding bandwidth, if applicable. |
cov |
The names of covariates. |
model |
For a sharp design, a list of the |
frame |
Returns the model frame used in fitting. |
na.action |
The observations removed from fitting due to missingness. |
impute |
A logical value indicating whether multiple imputation is used or not. |
d |
Numeric vector of the effect size (Cohen's d) for each estimate. |
Wong, V. C., Steiner, P. M., Cook, T. D. (2013). Analyzing regression-discontinuity designs with multiple assignment variables: A comparative study of four estimation methods. Journal of Educational and Behavioral Statistics, 38(2), 107-141. https://journals.sagepub.com/doi/10.3102/1076998611432172.
Imbens, G., Kalyanaraman, K. (2009). Optimal bandwidth choice for the regression discontinuity estimator (Working Paper No. 14726). National Bureau of Economic Research. https://www.nber.org/papers/w14726.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
Lee, D. S., Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281-355. doi:10.1257/jel.48.2.281.
Zeileis, A. (2006). Object-oriented computation of sandwich estimators. Journal of Statistical Software, 16(9), 1-16. doi:10.18637/jss.v016.i09
set.seed(12345) x1 <- runif(1000, -1, 1) x2 <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000) # centering mrd_est(y ~ x1 + x2 | cov, method = "center", t.design = c("geq", "geq")) # univariate mrd_est(y ~ x1 + x2 | cov, method = "univ", t.design = c("geq", "geq")) # frontier mrd_est(y ~ x1 + x2 | cov, method = "front", t.design = c("geq", "geq"))
set.seed(12345) x1 <- runif(1000, -1, 1) x2 <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000) # centering mrd_est(y ~ x1 + x2 | cov, method = "center", t.design = c("geq", "geq")) # univariate mrd_est(y ~ x1 + x2 | cov, method = "univ", t.design = c("geq", "geq")) # frontier mrd_est(y ~ x1 + x2 | cov, method = "front", t.design = c("geq", "geq"))
mrd_impute
estimates treatment effects in a multivariate regression discontinuity design (MRDD) with imputed missing values.
mrd_impute( formula, data, subset = NULL, cutpoint = NULL, bw = NULL, front.bw = NA, m = 10, k = 5, kernel = "triangular", se.type = "HC1", cluster = NULL, impute = NULL, verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE, local = 0.15, ngrid = 250, margin = 0.03, boot = NULL, method = c("center", "univ", "front"), t.design = NULL, stop.on.error = TRUE )
mrd_impute( formula, data, subset = NULL, cutpoint = NULL, bw = NULL, front.bw = NA, m = 10, k = 5, kernel = "triangular", se.type = "HC1", cluster = NULL, impute = NULL, verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE, local = 0.15, ngrid = 250, margin = 0.03, boot = NULL, method = c("center", "univ", "front"), t.design = NULL, stop.on.error = TRUE )
formula |
The formula of the MRDD; a symbolic description of the model to be fitted. This is supplied in the
format of |
data |
An optional data frame containing the variables in the model. If not found in |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
cutpoint |
A numeric vector of length 2 containing the cutpoints at which assignment to the treatment is determined.
The default is |
bw |
A vector specifying the bandwidths at which to estimate the RD.
Possible values are |
front.bw |
A non-negative numeric vector of length 3 specifying the bandwidths at which to estimate the RD for each
of three effects models (complete model, heterogeneous treatment model, and treatment only model)
detailed in Wong, Steiner, and Cook (2013). If |
m |
A non-negative integer specifying the number of uniformly-at-random samples to draw as search candidates for |
k |
A non-negative integer specifying the number of folds for cross-validation to determine |
kernel |
A string indicating which kernel to use. Options are |
se.type |
This specifies the robust standard error calculation method to use,
from the "sandwich" package. Options are,
as in |
cluster |
An optional vector of length n specifying clusters within which the errors are assumed
to be correlated. This will result in reporting cluster robust SEs. This option overrides
anything specified in |
impute |
An optional vector of length n containing a grouping variable that specifies the imputed variables with missing values. |
verbose |
A logical value indicating whether to print additional information to
the terminal. The default is |
less |
Logical. If |
est.cov |
Logical. If |
est.itt |
Logical. If |
local |
A non-negative numeric value specifying the range of neighboring points around the cutoff on the standardized scale, for each assignment variable. The default is 0.15. |
ngrid |
A non-negative integer specifying the number of non-zero grid points on each assignment variable, which is also the number of zero grid points on each assignment variable. The default is 250. The value used in Wong, Steiner and Cook (2013) is 2500, which may cause long computational time. |
margin |
A non-negative numeric value specifying the range of grid points beyond the minimum and maximum of sample points on each assignment variable. The default is 0.03. |
boot |
An optional non-negative integer specifying the number of bootstrap samples to obtain standard error of estimates.
This argument is not optional if method is |
method |
A string specifying the method to estimate the RD effect. Options are |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
stop.on.error |
A logical value indicating whether to remove bootstraps which cause error in the |
mrd_impute
returns an object of class "mrd
" or "mrdi"
for "front"
method.
The function summary
is used to obtain and print a summary of the
estimated regression discontinuity. The object of class mrd
is a list
containing the following components for each estimated treatment effect,
tau_MRD
or tau_R
and tau_M
:
call |
The matched call. |
type |
A string denoting either |
cov |
The names of covariates. |
bw |
Numeric vector of each bandwidth used in estimation. |
obs |
Vector of the number of observations within the corresponding bandwidth. |
model |
For a sharp design, a list of the |
frame |
Returns the model frame used in fitting. |
na.action |
The observations removed from fitting due to missingness. |
est |
Numeric vector of the estimate of the discontinuity in the outcome under a sharp MRDD or the Wald estimator in the fuzzy MRDD, for each corresponding bandwidth. |
d |
Numeric vector of the effect size (Cohen's d) for each estimate. |
se |
Numeric vector of the standard error for each corresponding bandwidth. |
z |
Numeric vector of the z statistic for each corresponding bandwidth. |
df |
Numeric vector of the degrees of freedom computed using Barnard and Rubin (1999) adjustment for imputation. |
p |
Numeric vector of the p-value for each corresponding bandwidth. |
ci |
The matrix of the 95 for each corresponding bandwidth. |
impute |
A logical value indicating whether multiple imputation is used or not. |
Wong, V. C., Steiner, P. M., Cook, T. D. (2013). Analyzing regression-discontinuity designs with multiple assignment variables: A comparative study of four estimation methods. Journal of Educational and Behavioral Statistics, 38(2), 107-141. https://journals.sagepub.com/doi/10.3102/1076998611432172.
Lee, D. S., Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281-355. doi:10.1257/jel.48.2.281.
Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
Barnard, J., Rubin, D. (1999). Small-Sample Degrees of Freedom with Multiple Imputation. Biometrika, 86(4), 948-55.
set.seed(12345) x1 <- runif(300, -1, 1) x2 <- runif(300, -1, 1) cov <- rnorm(300) y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(300) imp <- rep(1:3, each = 100) # all examples below have smaller numbers of m to keep run-time low # centering mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "center", t.design = c("geq", "geq"), m = 3) # univariate mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "univ", t.design = c("geq", "geq"), m = 3) # frontier - don't run due to computation time ## Not run: mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "front", boot = 1000, t.design = c("geq", "geq"), m = 3) ## End(Not run)
set.seed(12345) x1 <- runif(300, -1, 1) x2 <- runif(300, -1, 1) cov <- rnorm(300) y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(300) imp <- rep(1:3, each = 100) # all examples below have smaller numbers of m to keep run-time low # centering mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "center", t.design = c("geq", "geq"), m = 3) # univariate mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "univ", t.design = c("geq", "geq"), m = 3) # frontier - don't run due to computation time ## Not run: mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "front", boot = 1000, t.design = c("geq", "geq"), m = 3) ## End(Not run)
mrd_power
computes the empirical probability that a resulting parameter
estimate of the MRD is significant,
i.e. the empirical power (1 - beta).
mrd_power( num.rep = 100, sample.size = 100, x1.dist = "normal", x1.para = c(0, 1), x2.dist = "normal", x2.para = c(0, 1), x1.cut = 0, x2.cut = 0, x1.fuzzy = c(0, 0), x2.fuzzy = c(0, 0), x1.design = NULL, x2.design = NULL, coeff = c(0.1, 0.5, 0.5, 1, rep(0.1, 9)), eta.sq = 0.5, alpha.list = c(0.001, 0.01, 0.05) )
mrd_power( num.rep = 100, sample.size = 100, x1.dist = "normal", x1.para = c(0, 1), x2.dist = "normal", x2.para = c(0, 1), x1.cut = 0, x2.cut = 0, x1.fuzzy = c(0, 0), x2.fuzzy = c(0, 0), x1.design = NULL, x2.design = NULL, coeff = c(0.1, 0.5, 0.5, 1, rep(0.1, 9)), eta.sq = 0.5, alpha.list = c(0.001, 0.01, 0.05) )
num.rep |
A non-negative integer specifying the number of repetitions used to calculate the empirical power. The default is 100. |
sample.size |
A non-negative integer specifying the number of observations in each sample. The default is 100. |
x1.dist |
A string specifying the distribution of the first assignment variable, |
x1.para |
A numeric vector of length 2 specifying parameters of the distribution of the first assignment variable, |
x2.dist |
A string specifying the distribution of the second assignment variable, |
x2.para |
A numeric vector of length 2 specifying parameters of the distribution of the second assignment variable, |
x1.cut |
A numeric value containing the cutpoint at which assignment to the treatment is determined for the first assignment variable, |
x2.cut |
A numeric value containing the cutpoint at which assignment to the treatment is determined for the second assignment variable, |
x1.fuzzy |
A numeric vector of length 2 specifying the probabilities to be
assigned to the control condition, in terms of the first
assignment variable, |
x2.fuzzy |
A numeric vector of length 2 specifying the probabilities to be assigned to the control, in terms of the second
assignment variable, |
x1.design |
A string specifying the treatment option according to design for |
x2.design |
A string specifying the treatment option according to design for |
coeff |
A numeric vector specifying coefficients of variables in the linear model to generate data. Coefficients are in the following order:
The default is |
eta.sq |
A numeric value specifying the expected partial eta-squared of the linear model with respect to the treatment itself. It is used to control the variance of noise in the linear model. The default is 0.50. |
alpha.list |
A numeric vector containing significance levels (between 0 and 1) used to calculate the empirical alpha.
The default is |
mrd_power
returns an object of class
"mrdp
" containing the number of successful iterations,
mean, variance, and power (with alpha
of 0.001, 0.01, and 0.05)
for six estimators. The function summary
is used to obtain and print a summary of the power analysis.
The six estimators are as follows:
The 1st estimator, Linear
, provides results of the linear regression estimator
of combined RD using the centering approach.
The 2nd estimator, Opt
, provides results of the local linear regression estimator
of combined RD using the centering approach,
with the optimal bandwidth in the Imbens and Kalyanaraman (2012) paper.
The 3rd estimator, Linear
, provides results of the linear regression estimator
of separate RD in terms of x1
using the univariate approach.
The 4th estimator, Opt
, provides results of the local linear regression estimator
of separate RD in terms of x1
using the univariate approach,
with the optimal bandwidth in the Imbens and Kalyanaraman (2012) paper.
The 5th estimator, Linear
, provides results of the linear regression estimator
of separate RD in terms of x2
using the univariate approach.
The 6th estimator, Opt
, provides results of the local linear regression estimator
of separate RD in terms of x2
using the univariate approach,
with the optimal bandwidth in the Imbens and Kalyanaraman (2012) paper.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
## Not run: summary(mrd_power(x1.design = "l", x2.design = "l")) summary(mrd_power(x1.dist = "uniform", x1.cut = 0.5, x1.design = "l", x2.design = "l")) summary(mrd_power(x1.fuzzy = c(0.1, 0.1), x1.design = "l", x2.design = "l")) ## End(Not run)
## Not run: summary(mrd_power(x1.design = "l", x2.design = "l")) summary(mrd_power(x1.dist = "uniform", x1.cut = 0.5, x1.design = "l", x2.design = "l")) summary(mrd_power(x1.fuzzy = c(0.1, 0.1), x1.design = "l", x2.design = "l")) ## End(Not run)
mrd_sens_bw
refits the supplied model with varying bandwidths.
All other aspects of the model are held constant.
mrd_sens_bw(object, approach = c("center", "univ1", "univ2"), bws)
mrd_sens_bw(object, approach = c("center", "univ1", "univ2"), bws)
object |
An object returned by |
approach |
A string of the approaches to be refitted,
choosing from |
bws |
A positive numeric vector of the bandwidths for refitting an |
mrd_sens_bw
returns a dataframe containing the estimate est
and standard error se
for each supplied bandwidth and for the Imbens-Kalyanaraman (2012) optimal bandwidth, bw
,
and for each supplied approach, model
. Approaches are either user
specified ("usr"
) or based on the optimal bandwidth ("origin"
).
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
set.seed(12345) x1 <- runif(10000, -1, 1) x2 <- rnorm(10000, 10, 2) cov <- rnorm(10000) y <- 3 + 2 * x1 + 1 * x2 + 3 * cov + 10 * (x1 >= 0) + 5 * (x2 >= 10) + rnorm(10000) # front.bw arugment was supplied to speed up the example # users should choose appropriate values for front.bw mrd <- mrd_est(y ~ x1 + x2 | cov, cutpoint = c(0, 10), t.design = c("geq", "geq"), front.bw=c(1,1,1)) mrd_sens_bw(mrd, approach = "univ1", bws = seq(0.1, 1, length.out = 3))
set.seed(12345) x1 <- runif(10000, -1, 1) x2 <- rnorm(10000, 10, 2) cov <- rnorm(10000) y <- 3 + 2 * x1 + 1 * x2 + 3 * cov + 10 * (x1 >= 0) + 5 * (x2 >= 10) + rnorm(10000) # front.bw arugment was supplied to speed up the example # users should choose appropriate values for front.bw mrd <- mrd_est(y ~ x1 + x2 | cov, cutpoint = c(0, 10), t.design = c("geq", "geq"), front.bw=c(1,1,1)) mrd_sens_bw(mrd, approach = "univ1", bws = seq(0.1, 1, length.out = 3))
mrd_sens_cutoff
refits the supplied model with varying cutoff(s).
All other aspects of the model, such as the automatically calculated bandwidth, are held constant.
mrd_sens_cutoff(object, cutoffs)
mrd_sens_cutoff(object, cutoffs)
object |
An object returned by |
cutoffs |
A two-column numeric matrix of paired cutoff values
to be used for refitting an |
mrd_sens_cutoff
returns a dataframe containing the estimate est
and standard error se
for each pair of cutoffs (A1
and A2
) and for each model
. A1
contains varying cutoffs
for assignment 1 and A2
contains varying cutoffs for assignment 2.
The model
column contains the approach (either centering, univariate 1, or univariate 2)
for determining the cutoff and the parametric model (linear, quadratic, or cubic) or
non-parametric bandwidth setting (Imbens-Kalyanaraman 2012 optimal, half, or double) used for estimation.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
set.seed(12345) x1 <- runif(5000, -1, 1) x2 <- rnorm(5000, 10, 2) cov <- rnorm(5000) y <- 3 + 2 * x1 + 1 * x2 + 3 * cov + 10 * (x1 >= 0) + 5 * (x2 >= 10) + rnorm(5000) # front.bw arugment was supplied to speed up the example # users should choose appropriate values for front.bw mrd <- mrd_est(y ~ x1 + x2 | cov, cutpoint = c(0, 10), t.design = c("geq", "geq"), front.bw = c(1,1,1)) mrd_sens_cutoff(mrd, expand.grid(A1 = seq(-.5, .5, length.out = 3), A2 = 10))
set.seed(12345) x1 <- runif(5000, -1, 1) x2 <- rnorm(5000, 10, 2) cov <- rnorm(5000) y <- 3 + 2 * x1 + 1 * x2 + 3 * cov + 10 * (x1 >= 0) + 5 * (x2 >= 10) + rnorm(5000) # front.bw arugment was supplied to speed up the example # users should choose appropriate values for front.bw mrd <- mrd_est(y ~ x1 + x2 | cov, cutpoint = c(0, 10), t.design = c("geq", "geq"), front.bw = c(1,1,1)) mrd_sens_cutoff(mrd, expand.grid(A1 = seq(-.5, .5, length.out = 3), A2 = 10))
plot.mfrd
plots a 3D illustration of the bivariate frontier regression discontinuity design (RDD).
## S3 method for class 'mfrd' plot( x, model = c("m_s", "m_h", "m_t"), methodname = c("Param", "bw", "Half-bw", "Double-bw"), gran = 10, raw_data = TRUE, color_surface = FALSE, ... )
## S3 method for class 'mfrd' plot( x, model = c("m_s", "m_h", "m_t"), methodname = c("Param", "bw", "Half-bw", "Double-bw"), gran = 10, raw_data = TRUE, color_surface = FALSE, ... )
x |
An |
model |
A string containing the model specification. Options include one of |
methodname |
A string containing the method specification.
Options include one of |
gran |
A non-negative integer specifying the granularity of the surface grid (i.e. the desired number of predicted points before and after the cutoff, along each assignment variable). The default is 10. |
raw_data |
A logical value indicating whether the raw data points are plotted. The default is |
color_surface |
A logical value indicating whether the treated surface is colored. The default is |
... |
Additional graphic arguments passed to |
set.seed(12345) x1 <- runif(1000, -1, 1) x2 <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000) model <- mrd_est(y ~ x1 + x2, cutpoint = c(0, 0), t.design = c("geq", "geq")) plot(model$front$tau_MRD, "m_s", "Param")
set.seed(12345) x1 <- runif(1000, -1, 1) x2 <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000) model <- mrd_est(y ~ x1 + x2, cutpoint = c(0, 0), t.design = c("geq", "geq")) plot(model$front$tau_MRD, "m_s", "Param")
plot.rd
plots the relationship between the running variable and the outcome.
It is based on the plot.RD
function in the "rdd" package.
## S3 method for class 'rd' plot( x, preds = NULL, fit_line = c("linear", "quadratic", "cubic", "optimal", "half", "double"), fit_ci = c("area", "dot", "hide"), fit_ci_level = 0.95, bin_n = 20, bin_level = 0.95, bin_size = c("shade", "size"), quant_bin = TRUE, xlim = NULL, ylim = NULL, include_rugs = FALSE, ... )
## S3 method for class 'rd' plot( x, preds = NULL, fit_line = c("linear", "quadratic", "cubic", "optimal", "half", "double"), fit_ci = c("area", "dot", "hide"), fit_ci_level = 0.95, bin_n = 20, bin_level = 0.95, bin_size = c("shade", "size"), quant_bin = TRUE, xlim = NULL, ylim = NULL, include_rugs = FALSE, ... )
x |
An |
preds |
An optional vector of predictions generated by |
fit_line |
A character vector specifying models to be shown as fitted lines.
Options are |
fit_ci |
A string specifying whether and how to plot prediction confidence intervals
around the fitted lines. Options are |
fit_ci_level |
A numeric value between 0 and 1 specifying the confidence level of prediction CIs. The default is 0.95. |
bin_n |
An integer specifying the number of bins for binned data points. If |
bin_level |
A numeric value between 0 and 1 specifying the confidence level for CIs around binned data points. The default is 0.95. |
bin_size |
A string specifying how to plot the number of observations in each bin, by |
quant_bin |
A logical value indicating whether the data are binned by quantiles. The default is |
xlim |
An optional numeric vector containing the x-axis limits. |
ylim |
An optional numeric vector containing the y-axis limits. |
include_rugs |
A logical value indicating whether to include the 1d plot for both axes. The default is |
... |
Additional graphic arguments passed to |
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
set.seed(12345) dat <- data.frame(x = runif(1000, -1, 1), cov = rnorm(1000)) dat$tr <- as.integer(dat$x >= 0) dat$y <- 3 + 2 * dat$x + 3 * dat$cov + 10 * (dat$x >= 0) + rnorm(1000) rd <- rd_est(y ~ x + tr | cov, data = dat, cutpoint = 0, t.design = "geq") plot(rd)
set.seed(12345) dat <- data.frame(x = runif(1000, -1, 1), cov = rnorm(1000)) dat$tr <- as.integer(dat$x >= 0) dat$y <- 3 + 2 * dat$x + 3 * dat$cov + 10 * (dat$x >= 0) + rnorm(1000) rd <- rd_est(y ~ x + tr | cov, data = dat, cutpoint = 0, t.design = "geq") plot(rd)
predict.rd
makes predictions of means and standard deviations of RDs at different cutoffs.
## S3 method for class 'rd' predict(object, gran = 50, ...)
## S3 method for class 'rd' predict(object, gran = 50, ...)
object |
An |
gran |
A non-negative integer specifying the granularity of the data points (i.e. the desired number of predicted points). The default is 50. |
... |
Additional arguments passed to |
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) tr <- as.integer(x >= 0) rd <- rd_est(y ~ x + tr | cov, cutpoint = 0, t.design = "geq") predict(rd)
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) tr <- as.integer(x >= 0) rd <- rd_est(y ~ x + tr | cov, cutpoint = 0, t.design = "geq") predict(rd)
print.mfrd
prints a very basic summary of the multivariate frontier regression
discontinuity. It is based on the print.RD
function in the "rdd" package.
## S3 method for class 'mfrd' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'mfrd' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
An |
digits |
A non-negative integer specifying the number of digits to print.
The default is |
... |
Additional arguments passed to |
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
print.rd
prints a basic summary of the regression discontinuity.
print.rd
is based on the print.RD
function in the "rdd" package.
## S3 method for class 'rd' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'rd' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
An |
digits |
A non-negative integer specifying the number of digits to print.
The default is |
... |
Additional arguments passed to |
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
rd_est
estimates both sharp and fuzzy RDDs using parametric and non-parametric
(local linear) models.
It is based on the RDestimate
function in the "rdd" package.
Sharp RDDs (both parametric and non-parametric) are estimated using lm
in the
stats package.
Fuzzy RDDs (both parametric and non-parametric) are estimated using two-stage least-squares
ivreg
in the AER package.
For non-parametric models, Imbens-Kalyanaraman optimal bandwidths can be used,
rd_est( formula, data, subset = NULL, cutpoint = NULL, bw = NULL, kernel = "triangular", se.type = "HC1", cluster = NULL, verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE, t.design = NULL )
rd_est( formula, data, subset = NULL, cutpoint = NULL, bw = NULL, kernel = "triangular", se.type = "HC1", cluster = NULL, verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE, t.design = NULL )
formula |
The formula of the RDD; a symbolic description of the model to be fitted. This is supplied in the
format of |
data |
An optional data frame containing the variables in the model. If not found in |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
bw |
A vector specifying the bandwidths at which to estimate the RD.
Possible values are |
kernel |
A string indicating which kernel to use. Options are |
se.type |
This specifies the robust standard error calculation method to use,
from the "sandwich" package. Options are,
as in |
cluster |
An optional vector of length n specifying clusters within which the errors are assumed
to be correlated. This will result in reporting cluster robust SEs. This option overrides
anything specified in |
verbose |
A logical value indicating whether to print additional information to
the terminal. The default is |
less |
Logical. If |
est.cov |
Logical. If |
est.itt |
Logical. If |
t.design |
A string specifying the treatment option according to design.
Options are |
rd_est
returns an object of class "rd
".
The functions summary
and plot
are used to obtain and print a summary and
plot of the estimated regression discontinuity. The object of class rd
is a list
containing the following components:
type |
A string denoting either |
est |
Numeric vector of the estimate of the discontinuity in the outcome under a sharp RDD or the Wald estimator in the fuzzy RDD, for each corresponding bandwidth. |
se |
Numeric vector of the standard error for each corresponding bandwidth. |
z |
Numeric vector of the z statistic for each corresponding bandwidth. |
p |
Numeric vector of the p-value for each corresponding bandwidth. |
ci |
The matrix of the 95 for each corresponding bandwidth. |
d |
Numeric vector of the effect size (Cohen's d) for each estimate. |
cov |
The names of covariates. |
bw |
Numeric vector of each bandwidth used in estimation. |
obs |
Vector of the number of observations within the corresponding bandwidth. |
call |
The matched call. |
na.action |
The number of observations removed from fitting due to missingness. |
impute |
A logical value indicating whether multiple imputation is used or not. |
model |
For a sharp design, a list of the |
frame |
Returns the dataframe used in fitting the model. |
Lee, D. S., Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281-355. doi:10.1257/jel.48.2.281.
Imbens, G., Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142(2), 615-635. doi:10.1016/j.jeconom.2007.05.001.
Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
Angrist, J. D., Pischke, J.-S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton, NJ: Princeton University Press.
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Imbens, G., Kalyanaraman, K. (2009). Optimal bandwidth choice for the regression discontinuity estimator (Working Paper No. 14726). National Bureau of Economic Research. https://www.nber.org/papers/w14726.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) rd_est(y ~ x, t.design = "geq") # Efficiency gains can be made by including covariates (review SEs in "summary" output). rd_est(y ~ x | cov, t.design = "geq")
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) rd_est(y ~ x, t.design = "geq") # Efficiency gains can be made by including covariates (review SEs in "summary" output). rd_est(y ~ x | cov, t.design = "geq")
rd_impute
estimates treatment effects in an RDD with imputed missing values.
rd_impute( formula, data, subset = NULL, cutpoint = NULL, bw = NULL, kernel = "triangular", se.type = "HC1", cluster = NULL, impute = NULL, verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE, t.design = NULL )
rd_impute( formula, data, subset = NULL, cutpoint = NULL, bw = NULL, kernel = "triangular", se.type = "HC1", cluster = NULL, impute = NULL, verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE, t.design = NULL )
formula |
The formula of the RDD; a symbolic description of the model to be fitted. This is supplied in the
format of |
data |
An optional data frame containing the variables in the model. If not found in |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
bw |
A vector specifying the bandwidths at which to estimate the RD.
Possible values are |
kernel |
A string indicating which kernel to use. Options are |
se.type |
This specifies the robust standard error calculation method to use,
from the "sandwich" package. Options are,
as in |
cluster |
An optional vector specifying clusters within which the errors are assumed
to be correlated. This will result in reporting cluster robust SEs. This option overrides
anything specified in |
impute |
An optional vector of length n, indexing whole imputations. |
verbose |
A logical value indicating whether to print additional information to
the terminal. The default is |
less |
Logical. If |
est.cov |
Logical. If |
est.itt |
Logical. If |
t.design |
A string specifying the treatment option according to design.
Options are |
rd_impute
returns an object of class "rd
".
The functions summary
and plot
are used to obtain and print a summary and
plot of the estimated regression discontinuity. The object of class rd
is a list
containing the following components:
call |
The matched call. |
impute |
A logical value indicating whether multiple imputation is used or not. |
type |
A string denoting either |
cov |
The names of covariates. |
bw |
Numeric vector of each bandwidth used in estimation. |
obs |
Vector of the number of observations within the corresponding bandwidth. |
model |
For a sharp design, a list of the |
frame |
Returns the model frame used in fitting. |
na.action |
The observations removed from fitting due to missingness. |
est |
Numeric vector of the estimate of the discontinuity in the outcome under a sharp RDD or the Wald estimator in the fuzzy RDD, for each corresponding bandwidth. |
d |
Numeric vector of the effect size (Cohen's d) for each estimate. |
se |
Numeric vector of the standard error for each corresponding bandwidth. |
z |
Numeric vector of the z statistic for each corresponding bandwidth. |
df |
Numeric vector of the degrees of freedom computed using Barnard and Rubin (1999) adjustment for imputation. |
p |
Numeric vector of the p-value for each corresponding bandwidth. |
ci |
The matrix of the 95 for each corresponding bandwidth. |
Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
Imbens, G., Kalyanaraman, K. (2009). Optimal bandwidth choice for the regression discontinuity estimator (Working Paper No. 14726). National Bureau of Economic Research. https://www.nber.org/papers/w14726.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Barnard, J., Rubin, D. (1999). Small-Sample Degrees of Freedom with Multiple Imputation. Biometrika, 86(4), 948-55.
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x < 0) + rnorm(1000) group <- rep(1:10, each = 100) rd_impute(y ~ x, impute = group, t.design = "l") # Efficiency gains can be made by including covariates (review SEs in "summary" output). rd_impute(y ~ x | cov, impute = group, t.design = "l")
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x < 0) + rnorm(1000) group <- rep(1:10, each = 100) rd_impute(y ~ x, impute = group, t.design = "l") # Efficiency gains can be made by including covariates (review SEs in "summary" output). rd_impute(y ~ x | cov, impute = group, t.design = "l")
rd_power
computes the empirical probability that a resulting parameter
estimate of the MRD is significant,
i.e. the empirical power (1 - beta).
rd_power( num.rep = 100, sample.size = 100, x.dist = "normal", x.para = c(0, 1), x.cut = 0, x.fuzzy = c(0, 0), x.design = NULL, coeff = c(0.3, 1, 0.2, 0.3), eta.sq = 0.5, alpha.list = c(0.001, 0.01, 0.05) )
rd_power( num.rep = 100, sample.size = 100, x.dist = "normal", x.para = c(0, 1), x.cut = 0, x.fuzzy = c(0, 0), x.design = NULL, coeff = c(0.3, 1, 0.2, 0.3), eta.sq = 0.5, alpha.list = c(0.001, 0.01, 0.05) )
num.rep |
A non-negative integer specifying the number of repetitions used to calculate the empirical power. The default is 100. |
sample.size |
A non-negative integer specifying the number of observations in each sample. The default is 100. |
x.dist |
A string specifying the distribution of the assignment variable, |
x.para |
A numeric vector of length 2 specifying parameters of the distribution of the first assignment variable, |
x.cut |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
x.fuzzy |
A numeric vector of length 2 specifying the probabilities to be assigned to the control, in terms of the
assignment variable, |
x.design |
A string specifying the treatment option according to design.
Options are |
coeff |
A numeric vector specifying coefficients of variables in the linear model to generate data. Coefficients are in the following order:
The default is |
eta.sq |
A numeric value specifying the expected partial eta-squared of the linear model with respect to the treatment itself. It is used to control the variance of noise in the linear model. The default is 0.50. |
alpha.list |
A numeric vector containing significance levels (between 0 and 1) used to calculate the empirical alpha.
The default is |
rd_power
returns an object of class
"rdp
", including containing the mean, variance, and power (with alpha
of 0.001, 0.01, and 0.05)
for two estimators. The function summary
is used to obtain and print a summary of the power analysis. The two estimators are:
The 1st estimator, Linear
, provides results of the linear regression estimator.
The 2nd estimator, Opt
, provides results of the local linear regression estimator of RD,
with the optimal bandwidth in the Imbens and Kalyanaraman (2012) paper.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
## Not run: summary(rd_power(x.design = "l")) summary(rd_power(x.dist = "uniform", x.cut = 0.5, x.design = "l")) summary(rd_power(x.fuzzy = c(0.1, 0.1), x.design = "l")) ## End(Not run)
## Not run: summary(rd_power(x.design = "l")) summary(rd_power(x.dist = "uniform", x.cut = 0.5, x.design = "l")) summary(rd_power(x.fuzzy = c(0.1, 0.1), x.design = "l")) ## End(Not run)
rd_sens_bw
refits the supplied model with varying bandwidths.
All other aspects of the model are held constant.
rd_sens_bw(object, bws)
rd_sens_bw(object, bws)
object |
An object returned by |
bws |
A positive numeric vector of the bandwidths for refitting an |
rd_sens_bw
returns a dataframe containing the estimate est
and standard error se
for each supplied bandwidth and for the Imbens-Kalyanaraman (2012) optimal bandwidth, bw
,
and for each supplied approach, model
. Approaches are either user
specified ("usr"
) or based on the optimal bandwidth ("origin"
).
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) rd <- rd_est(y ~ x | cov, t.design = "geq") rd_sens_bw(rd, bws = seq(.1, 1, length.out = 5))
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) rd <- rd_est(y ~ x | cov, t.design = "geq") rd_sens_bw(rd, bws = seq(.1, 1, length.out = 5))
rd_sens_cutoff
refits the supplied model with varying cutoff(s).
All other aspects of the model, such as the automatically calculated bandwidth, are held constant.
rd_sens_cutoff(object, cutoffs)
rd_sens_cutoff(object, cutoffs)
object |
An object returned by |
cutoffs |
A numeric vector of cutoff values to be used for refitting
an |
rd_sens_cutoff
returns a dataframe containing the estimate est
and standard error se
for each cutoff value (A1
). Column A1
contains varying cutoffs
on the assignment variable. The model
column contains the parametric model (linear, quadratic, or cubic) or
non-parametric bandwidth setting (Imbens-Kalyanaraman 2012 optimal, half, or double) used for estimation.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) rd <- rd_est(y ~ x | cov, t.design = "geq") rd_sens_cutoff(rd, seq(-.5, .5, length.out = 10))
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) rd <- rd_est(y ~ x | cov, t.design = "geq") rd_sens_cutoff(rd, seq(-.5, .5, length.out = 10))
rd_type
cross-tabulates observations based on (1) a binary treatment and
(2) one or two assignments and their cutoff values.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::rd_type().
rd_type( data, treat, assign_1, cutoff_1, operator_1 = NULL, assign_2 = NULL, cutoff_2 = NULL, operator_2 = NULL )
rd_type( data, treat, assign_1, cutoff_1, operator_1 = NULL, assign_2 = NULL, cutoff_2 = NULL, operator_2 = NULL )
data |
A |
treat |
A string specifying the name of the numeric treatment variable (treated = positive values). |
assign_1 |
A string specifying the variable name of the primary assignment. |
cutoff_1 |
A numeric value containing the cutpoint at which assignment to the treatment is determined, for the primary assignment. |
operator_1 |
The operator specifying the treatment option according to design for the primary assignment.
Options are
|
assign_2 |
An optional string specifying the variable name of the secondary assignment. |
cutoff_2 |
An optional numeric value containing the cutpoint at which assignment to the treatment is determined, for the secondary assignment. |
operator_2 |
The operator specifying the treatment option according to design for the secondary assignment.
Options are
|
rd_type
returns a list of two elements:
crosstab |
The cross-table as a data.frame. Columns in the dataframe include treatment rules, number of observations in the control condition, number of observations in the treatment condition, and the probability of an observation being in treatment or control. |
type |
A string specifying the type of design used, either |
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) df <- data.frame(cbind(y, x, t = x>=0)) rddapp:::rd_type(df, 't', 'x', 0, 'geq')
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) df <- data.frame(cbind(y, x, t = x>=0)) rddapp:::rd_type(df, 't', 'x', 0, 'geq')
sens_plot
plots the sensitivity analysis for cutpoints or bandwidths.
sens_plot( sim_results, level = 0.95, x = c("A1", "A2", "bw"), plot_models = unique(sim_results$model), yrange = NULL )
sens_plot( sim_results, level = 0.95, x = c("A1", "A2", "bw"), plot_models = unique(sim_results$model), yrange = NULL )
sim_results |
A |
level |
A numeric value between 0 and 1 specifying the confidence level for CIs (assuming a normal sampling distribution). The default is 0.95. |
x |
A string of the column name of the varying parameter in |
plot_models |
A character vector specifying the models to be plotted (i.e. models estimated with
different approaches). Possible values are |
yrange |
An optional numeric vector specifying the range of the y-axis. |
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) m <- rd_est(y ~ x | cov, t.design = "geq") sim_cutoff <- rd_sens_cutoff(m, seq(-.5, .5, length.out = 10)) sens_plot(sim_cutoff, x = "A1", plot_models = c("linear", "optimal")) sim_bw <- rd_sens_bw(m, seq(.1, 1, length.out = 10)) sens_plot(sim_bw, x = "bw")
set.seed(12345) x <- runif(1000, -1, 1) cov <- rnorm(1000) y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000) m <- rd_est(y ~ x | cov, t.design = "geq") sim_cutoff <- rd_sens_cutoff(m, seq(-.5, .5, length.out = 10)) sens_plot(sim_cutoff, x = "A1", plot_models = c("linear", "optimal")) sim_bw <- rd_sens_bw(m, seq(.1, 1, length.out = 10)) sens_plot(sim_bw, x = "bw")
shiny_run
launches the R Shiny application for "rddapp".
shiny_run(app_name = "shinyrdd")
shiny_run(app_name = "shinyrdd")
app_name |
A string specifying the name of the R Shiny app. The default is |
## Not run: shiny_run() shiny_run("shinyrdd") ## End(Not run)
## Not run: shiny_run() shiny_run("shinyrdd") ## End(Not run)
summary.mfrd
is a summary
method for class "mfrd"
.
It is based on the summary.RD
function in the "rdd" package.
## S3 method for class 'mfrd' summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'mfrd' summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
object |
An object of class |
level |
A numeric value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
summary.mfrd
returns a list containing the following components:
coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the complete model. |
ht_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the heterogeneous treatment model. |
t_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the treatment only model. |
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
summary.mrd
is a summary
method for class "mrd"
.
It is based on summary.RD
function in the "rdd" package.
## S3 method for class 'mrd' summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'mrd' summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
object |
An object of class |
level |
A numeric value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
summary.mrd
returns a list which has the following components depending on methods
implemented in the "mrd"
object:
center_coefficients |
A matrix containing bandwidths, number of observations, estimates, SEs, confidence intervals, z-values and p-values for each estimated bandwidth and/or parametric model. |
univR_coefficients |
A matrix containing bandwidths, number of observations, estimates, SEs, confidence intervals, z-values and p-values for each estimated bandwidth and/or parametric model. |
univM_coefficients |
A matrix containing bandwidths, number of observations, estimates, SEs, confidence intervals, z-values and p-values for each estimated bandwidth and/or parametric model. |
front_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the complete model. |
front_ht_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the heterogeneous treatment model. |
front_t_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the treatment only model. |
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
summary.mrdi
is a summary
method for class "mrdi"
.
It is based on summary.RD
function in the "rdd" package.
## S3 method for class 'mrdi' summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'mrdi' summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
object |
An object of class |
level |
A numeric value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
summary.mrdi
returns a list which has the following components:
coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the complete model. |
ht_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the heterogeneous treatment model. |
t_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the treatment only model. |
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
summary.mrdp
is a summary
method for class "mrdp"
.
It is based on summary.RD
function in the "rdd" package.
## S3 method for class 'mrdp' summary(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'mrdp' summary(object, digits = max(3, getOption("digits") - 3), ...)
object |
An object of class |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
summary.mrdp
returns a list which has the following components:
coefficients |
A matrix containing the mean, variance, and empirical alpha of each estimator. |
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
summary.rd
is a summary
method for class "rd"
It is based on summary.RD
function in the "rdd" package.
## S3 method for class 'rd' summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'rd' summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
object |
An object of class |
level |
A numeric value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
summary.rd
returns a list which has the following components:
coefficients |
A matrix containing bandwidths, number of observations, estimates, SEs, confidence intervals, z-values and p-values for each estimated bandwidth. |
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
summary.rdp
is a summary
method for class "rdp"
.
It is based on summary.RD
function in the "rdd" package.
## S3 method for class 'rdp' summary(object, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'rdp' summary(object, digits = max(3, getOption("digits") - 3), ...)
object |
An object of class |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
summary.rdp
returns a list which has the following components:
coefficients |
A matrix containing the mean, variance, and empirical alpha of each estimator. |
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
treat_assign
computes the treatment variable, t
, based on the cutoff of
assignment variable, x
.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::treat_assign().
treat_assign(x, cut = 0, t.design = "l")
treat_assign(x, cut = 0, t.design = "l")
x |
A numeric vector containing the assignment variable, |
cut |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
t.design |
A string specifying the treatment option according to design.
Options are |
treat_assign
returns the treatment variable as a vector according to the design,
where 1 means the treated group and 0 means the control group.
var_center
computes the univariate assignment variable, x
based on the cutoffs of
two assignment variables: x1
and x2
.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::var_center().
var_center(x, cut = c(0, 0), t.design = NULL, t.plot = FALSE)
var_center(x, cut = c(0, 0), t.design = NULL, t.plot = FALSE)
x |
Data frame or matrix of two assignment variables,
where the first column is |
cut |
A numeric vector of length 2 containing the cutpoints at which assignment to the treatment is determined.
The default is |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
t.plot |
A logical value indicating whether to calculate the univariate treatment variable, |
var_center
returns the univariate assignment variable as a vector
according to the design.
wt_kern
calculates the appropriate kernel weights for a vector.
This is useful when, for instance, one wishes to perform local regression.
It is based on the kernelwts
function in the "rdd" package.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::wt_kern().
wt_kern(X, center, bw, kernel = "triangular")
wt_kern(X, center, bw, kernel = "triangular")
X |
A numeric vector containing the the input |
center |
A numeric value specifying the point from which distances should be calculated. |
bw |
A numeric value specifying the bandwidth. |
kernel |
A string indicating which kernel to use. Options are |
wt_kern
returns a vector of weights with length equal to that of the X
input
(one weight per element of X
).
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
wt_kern_bivariate
calculates the appropriate weights for two variables for
Multivariate Frontier Regression Discontinuity Estimation with nonparametric implementation.
Kernel weights are calculated based on the L1 distance of the two variables from the frontiers.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::wt_kern_bivariate().
wt_kern_bivariate( X1, X2, center1, center2, bw, kernel = "triangular", t.design = NULL )
wt_kern_bivariate( X1, X2, center1, center2, bw, kernel = "triangular", t.design = NULL )
X1 |
The input x1 values for the first vector. This variable represents the axis along which kernel weighting should be performed; the first assignment variable in an MRDD. |
X2 |
The input x2 values for the second vector. |
center1 |
A numeric value specifying the point from which distances should be calculated for the first vector, |
center2 |
A numeric value specifying the point from which distances should be calculated for the second vector, |
bw |
A numeric vector specifying the bandwidths for each of three effects models (complete model, heterogeneous treatment model, and treatment only model) detailed in Wong, Steiner, and Cook (2013). |
kernel |
A string indicating which kernel to use. Options are |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
wt_bivariate_kern
returns a matrix of weights and distances with length equal to that of the X1
and X2
input.
The first and second weights and distances are calculated with respect to all frontiers of different treatments.
The third weight and distance are calculated with respect to the overall frontier of treatment versus
non-treatment.