Title: | Fit GLM's with High-Dimensional k-Way Fixed Effects |
---|---|
Description: | Provides a routine to partial out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm described in Stammann (2018) <arXiv:1707.01815> and is restricted to glm's that are based on maximum likelihood estimation and nonlinear. It also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine and includes robust and multi-way clustered standard errors. Further the package provides analytical bias corrections for binary choice models derived by Fernandez-Val and Weidner (2016) <doi:10.1016/j.jeconom.2015.12.014> and Hinz, Stammann, and Wanner (2020) <arXiv:2004.12655>. |
Authors: | Amrei Stammann [aut, cre], Daniel Czarnowske [aut] |
Maintainer: | Amrei Stammann <[email protected]> |
License: | GPL-3 |
Version: | 0.3.4 |
Built: | 2024-11-21 05:21:44 UTC |
Source: | https://github.com/amrei-stammann/alpaca |
-way fixed effectsProvides a routine to partial out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm described in Stammann (2018) and is restricted to glm's that are based on maximum likelihood estimation and nonlinear. It also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine and includes robust and multi-way clustered standard errors. Further the package provides analytical bias corrections for binary choice models derived by Fernández-Val and Weidner (2016) and Hinz, Stammann, and Wanner (2020).
Note: Linear models are also beyond the scope of this package since there is already a comprehensive procedure available felm.
-way fixed effectsConcentrates out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is restricted to glm's that are based on maximum likelihood estimation. This excludes all quasi-variants of glm. The package also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine and includes robust and multi-way clustered standard errors. Further the package provides an analytical bias-correction for binary choice models (logit and probit) derived by Fernandez-Val and Weidner (2016).
Note: Linear models are also beyond the scope of this package since there is already a comprehensive procedure available felm.
biasCorr
is a post-estimation routine that can be used to substantially reduce the
incidental parameter bias problem (Neyman and Scott (1948)) present in nonlinear fixed effects
models (see Fernández-Val and Weidner (2018) for an overview). The command applies the analytical
bias correction derived by Fernández-Val and Weidner (2016) and Hinz, Stammann, and Wanner (2020)
to obtain bias-corrected estimates of the structural parameters and is currently restricted to
binomial
with one-, two-, and three-way fixed effects.
biasCorr(object = NULL, L = 0L, panel.structure = c("classic", "network"))
biasCorr(object = NULL, L = 0L, panel.structure = c("classic", "network"))
object |
an object of class |
L |
unsigned integer indicating a bandwidth for the estimation of spectral densities proposed by Hahn and Kuersteiner (2011). Default is zero, which should be used if all regressors are assumed to be strictly exogenous with respect to the idiosyncratic error term. In the presence of weakly exogenous regressors, e.g. lagged outcome variables, Fernández-Val and Weidner (2016, 2018) suggest to choose a bandwidth between one and four. Note that the order of factors to be partialed out is important for bandwidths larger than zero (see vignette for details). |
panel.structure |
a string equal to |
The function biasCorr
returns a named list of classes "biasCorr"
and
"feglm"
.
Czarnowske, D. and A. Stammann (2020). "Fixed Effects Binary Choice Models: Estimation and Inference with Long Panels". ArXiv e-prints.
Fernández-Val, I. and M. Weidner (2016). "Individual and time effects in nonlinear panel models with large N, T". Journal of Econometrics, 192(1), 291-312.
Fernández-Val, I. and M. Weidner (2018). "Fixed effects estimation of large-t panel data models". Annual Review of Economics, 10, 109-138.
Hahn, J. and G. Kuersteiner (2011). "Bias reduction for dynamic nonlinear panel models with fixed effects". Econometric Theory, 27(6), 1152-1191.
Hinz, J., A. Stammann, and J. Wanner (2020). "State Dependence and Unobserved Heterogeneity in the Extensive Margin of Trade". ArXiv e-prints.
Neyman, J. and E. L. Scott (1948). "Consistent estimates based on partially consistent observations". Econometrica, 16(1), 1-32.
# Generate an artificial data set for logit models library(alpaca) data <- simGLM(1000L, 20L, 1805L, model = "logit") # Fit 'feglm()' mod <- feglm(y ~ x1 + x2 + x3 | i + t, data) # Apply analytical bias correction mod.bc <- biasCorr(mod) summary(mod.bc)
# Generate an artificial data set for logit models library(alpaca) data <- simGLM(1000L, 20L, 1805L, model = "logit") # Fit 'feglm()' mod <- feglm(y ~ x1 + x2 + x3 | i + t, data) # Apply analytical bias correction mod.bc <- biasCorr(mod) summary(mod.bc)
coef.APEs
is a generic function which extracts estimates of the average partial
effects from objects returned by getAPEs
.
## S3 method for class 'APEs' coef(object, ...)
## S3 method for class 'APEs' coef(object, ...)
object |
an object of class |
... |
other arguments. |
The function coef.APEs
returns a named vector of estimates of the average partial
effects.
coef.feglm
is a generic function which extracts estimates of the structural parameters
from objects returned by feglm
.
## S3 method for class 'feglm' coef(object, ...)
## S3 method for class 'feglm' coef(object, ...)
object |
an object of class |
... |
other arguments. |
The function coef.feglm
returns a named vector of estimates of the structural
parameters.
coef.summary.APEs
is a generic function which extracts a coefficient matrix for
average partial effects from objects returned by getAPEs
.
## S3 method for class 'summary.APEs' coef(object, ...)
## S3 method for class 'summary.APEs' coef(object, ...)
object |
an object of class |
... |
other arguments. |
The function coef.summary.APEs
returns a named matrix of estimates related to the
average partial effects.
coef.summary.feglm
is a generic function which extracts a coefficient matrix for
structural parameters from objects returned by feglm
.
## S3 method for class 'summary.feglm' coef(object, ...)
## S3 method for class 'summary.feglm' coef(object, ...)
object |
an object of class |
... |
other arguments. |
The function coef.summary.feglm
returns a named matrix of estimates related to the
structural parameters.
-way fixed effectsfeglm
can be used to fit generalized linear models with many high-dimensional fixed
effects. The estimation procedure is based on unconditional maximum likelihood and can be
interpreted as a “weighted demeaning” approach that combines the work of Gaure (2013) and
Stammann et. al. (2016). For technical details see Stammann (2018). The routine is well suited
for large data sets that would be otherwise infeasible to use due to memory limitations.
Remark: The term fixed effect is used in econometrician's sense of having intercepts for each level in each category.
feglm( formula = NULL, data = NULL, family = binomial(), weights = NULL, beta.start = NULL, eta.start = NULL, control = NULL )
feglm( formula = NULL, data = NULL, family = binomial(), weights = NULL, beta.start = NULL, eta.start = NULL, control = NULL )
formula |
an object of class |
data |
an object of class |
family |
a description of the error distribution and link function to be used in the model.
Similar to |
weights |
an optional string with the name of the 'prior weights' variable in |
beta.start |
an optional vector of starting values for the structural parameters in the linear
predictor. Default is |
eta.start |
an optional vector of starting values for the linear predictor. |
control |
a named list of parameters for controlling the fitting process. See
|
If feglm
does not converge this is often a sign of linear dependence between
one or more regressors and a fixed effects category. In this case, you should carefully inspect
your model specification.
The function feglm
returns a named list of class "feglm"
.
Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis, 66.
Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).
Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.
Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.
# Generate an artificial data set for logit models library(alpaca) data <- simGLM(1000L, 20L, 1805L, model = "logit") # Fit 'feglm()' mod <- feglm(y ~ x1 + x2 + x3 | i + t, data) summary(mod)
# Generate an artificial data set for logit models library(alpaca) data <- simGLM(1000L, 20L, 1805L, model = "logit") # Fit 'feglm()' mod <- feglm(y ~ x1 + x2 + x3 | i + t, data) summary(mod)
feglm
Control ParametersSet and change parameters used for fitting feglm
.
feglm.control(dev.tol = 1e-08, step.tol = 1e-08, pseudo.tol = 1e-05, rho.tol = 1e-04, iter.max = 100L, trace = 0L, drop.pc = TRUE)
feglm.control(dev.tol = 1e-08, step.tol = 1e-08, pseudo.tol = 1e-05, rho.tol = 1e-04, iter.max = 100L, trace = 0L, drop.pc = TRUE)
dev.tol |
tolerance level for the first stopping condition of the maximization routine. The
stopping condition is based on the relative change of the deviance in iteration |
step.tol |
tolerance level for the second stopping condition of the maximization routine. The
stopping condition is based on the euclidean norm of the step size in iteration |
pseudo.tol |
tolerance level for the stopping condition of the “pseudo demeaning” algorithm.
The stopping condition is based on the relative change of euclidean norm in iteration |
rho.tol |
tolerance level for the stephalving in the maximization routine. Stephalving only takes
place if the deviance in iteration |
iter.max |
unsigned integer indicating the maximum number of iterations in the maximization routine. |
trace |
unsigned integer indicating if output should be produced in each iteration. Default is
|
drop.pc |
logical indicating to drop observations that are perfectly classified and hence do not
contribute to the log-likelihood. This option is useful to reduce the computational costs
of the maximization problem, since it reduces the number of observations and does not change the
estimates. Default is |
The function feglm.control
returns a named list of control
parameters.
-way fixed effectsfeglm.nb
can be used to fit negative binomial generalized linear models with many
high-dimensional fixed effects (see feglm
).
feglm.nb( formula = NULL, data = NULL, weights = NULL, beta.start = NULL, eta.start = NULL, init.theta = NULL, link = c("log", "identity", "sqrt"), control = NULL )
feglm.nb( formula = NULL, data = NULL, weights = NULL, beta.start = NULL, eta.start = NULL, init.theta = NULL, link = c("log", "identity", "sqrt"), control = NULL )
formula , data , weights , beta.start , eta.start , control
|
see |
init.theta |
an optional initial value for the theta parameter (see |
link |
the link function. Must be one of |
If feglm.nb
does not converge this is usually a sign of linear dependence between one or
more regressors and a fixed effects category. In this case, you should carefully inspect your
model specification.
The function feglm.nb
returns a named list of class "feglm"
.
Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis. 66.
Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).
Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.
Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.
feglm
Control ParametersSet and change parameters used for fitting feglm
.
Note: feglm.control
is deprecated and will be removed soon.
feglmControl( dev.tol = 1e-08, center.tol = 1e-08, iter.max = 25L, limit = 10L, trace = FALSE, drop.pc = TRUE, keep.mx = TRUE, conv.tol = NULL, rho.tol = NULL, pseudo.tol = NULL, step.tol = NULL ) feglm.control(...)
feglmControl( dev.tol = 1e-08, center.tol = 1e-08, iter.max = 25L, limit = 10L, trace = FALSE, drop.pc = TRUE, keep.mx = TRUE, conv.tol = NULL, rho.tol = NULL, pseudo.tol = NULL, step.tol = NULL ) feglm.control(...)
dev.tol |
tolerance level for the first stopping condition of the maximization routine. The
stopping condition is based on the relative change of the deviance in iteration |
center.tol |
tolerance level for the stopping condition of the centering algorithm.
The stopping condition is based on the relative change of the centered variable similar to
felm. Default is |
iter.max |
unsigned integer indicating the maximum number of iterations in the maximization
routine. Default is |
limit |
unsigned integer indicating the maximum number of iterations of
|
trace |
logical indicating if output should be produced in each iteration. Default is |
drop.pc |
logical indicating to drop observations that are perfectly classified/separated and hence
do not contribute to the log-likelihood. This option is useful to reduce the computational costs of
the maximization problem and improves the numerical stability of the algorithm. Note that dropping
perfectly separated observations does not affect the estimates. Default is |
keep.mx |
logical indicating if the centered regressor matrix should be stored. The centered regressor
matrix is required for some covariance estimators, bias corrections, and average partial effects. This
option saves some computation time at the cost of memory. Default is |
conv.tol , rho.tol
|
deprecated; step-halving is now similar to |
pseudo.tol |
deprecated; use |
step.tol |
deprecated; termination conditions is now similar to |
... |
arguments passed to the deprecated function |
The function feglmControl
returns a named list of control
parameters.
feglm
fitted valuesfitted.feglm
is a generic function which extracts fitted values from an object
returned by feglm
.
## S3 method for class 'feglm' fitted(object, ...)
## S3 method for class 'feglm' fitted(object, ...)
object |
an object of class |
... |
other arguments. |
The function fitted.feglm
returns a vector of fitted values.
getAPEs
is a post-estimation routine that can be used to estimate average partial
effects with respect to all covariates in the model and the corresponding covariance matrix. The
estimation of the covariance is based on a linear approximation (delta method) plus an optional
finite population correction. Note that the command automatically determines which of the regressors
are binary or non-binary.
Remark: The routine currently does not allow to compute average partial effects based on functional forms like interactions and polynomials.
getAPEs( object = NULL, n.pop = NULL, panel.structure = c("classic", "network"), sampling.fe = c("independence", "unrestricted"), weak.exo = FALSE )
getAPEs( object = NULL, n.pop = NULL, panel.structure = c("classic", "network"), sampling.fe = c("independence", "unrestricted"), weak.exo = FALSE )
object |
an object of class |
n.pop |
unsigned integer indicating a finite population correction for the estimation of the
covariance matrix of the average partial effects proposed by
Cruz-Gonzalez, Fernández-Val, and Weidner (2017). The correction factor is computed as follows:
|
panel.structure |
a string equal to |
sampling.fe |
a string equal to |
weak.exo |
logical indicating if some of the regressors are assumed to be weakly exogenous (e. g.
predetermined). If object is of class |
The function getAPEs
returns a named list of class "APEs"
.
Cruz-Gonzalez, M., I. Fernández-Val, and M. Weidner (2017). "Bias corrections for probit and logit models with two-way fixed effects". The Stata Journal, 17(3), 517-545.
Czarnowske, D. and A. Stammann (2020). "Fixed Effects Binary Choice Models: Estimation and Inference with Long Panels". ArXiv e-prints.
Fernández-Val, I. and M. Weidner (2016). "Individual and time effects in nonlinear panel models with large N, T". Journal of Econometrics, 192(1), 291-312.
Fernández-Val, I. and M. Weidner (2018). "Fixed effects estimation of large-t panel data models". Annual Review of Economics, 10, 109-138.
Hinz, J., A. Stammann, and J. Wanner (2020). "State Dependence and Unobserved Heterogeneity in the Extensive Margin of Trade". ArXiv e-prints.
Neyman, J. and E. L. Scott (1948). "Consistent estimates based on partially consistent observations". Econometrica, 16(1), 1-32.
# Generate an artificial data set for logit models library(alpaca) data <- simGLM(1000L, 20L, 1805L, model = "logit") # Fit 'feglm()' mod <- feglm(y ~ x1 + x2 + x3 | i + t, data) # Compute average partial effects mod.ape <- getAPEs(mod) summary(mod.ape) # Apply analytical bias correction mod.bc <- biasCorr(mod) summary(mod.bc) # Compute bias-corrected average partial effects mod.ape.bc <- getAPEs(mod.bc) summary(mod.ape.bc)
# Generate an artificial data set for logit models library(alpaca) data <- simGLM(1000L, 20L, 1805L, model = "logit") # Fit 'feglm()' mod <- feglm(y ~ x1 + x2 + x3 | i + t, data) # Compute average partial effects mod.ape <- getAPEs(mod) summary(mod.ape) # Apply analytical bias correction mod.bc <- biasCorr(mod) summary(mod.bc) # Compute bias-corrected average partial effects mod.ape.bc <- getAPEs(mod.bc) summary(mod.ape.bc)
feglm
Recover estimates of the fixed effects by alternating between the normal equations of the fixed effects as shown by Stammann (2018).
Remark: The system might not have a unique solution since we do not take collinearity into account. If the solution is not unique, an estimable function has to be applied to our solution to get meaningful estimates of the fixed effects. See Gaure (n. d.) for an extensive treatment of this issue.
getFEs(object = NULL, alpha.tol = 1e-08)
getFEs(object = NULL, alpha.tol = 1e-08)
object |
an object of class |
alpha.tol |
tolerance level for the stopping condition. The algorithm is stopped in iteration
|
The function getFEs
returns a named list containing named vectors of estimated
fixed effects.
Gaure, S. (n. d.). "Multicollinearity, identification, and estimable functions". Unpublished.
Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-way Fixed Effects". ArXiv e-prints.
feglm
fitspredict.feglm
is a generic function which obtains predictions from an object
returned by feglm
.
## S3 method for class 'feglm' predict(object, type = c("link", "response"), ...)
## S3 method for class 'feglm' predict(object, type = c("link", "response"), ...)
object |
an object of class |
type |
the type of prediction required. |
... |
other arguments. |
The function predict.feglm
returns a vector of predictions.
APEs
print.APEs
is a generic function which displays some minimal information from
objects returned by getAPEs
.
## S3 method for class 'APEs' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'APEs' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
an object of class |
digits |
unsigned integer indicating the number of decimal places. Default is
|
... |
other arguments. |
feglm
print.feglm
is a generic function which displays some minimal information from
objects returned by feglm
.
## S3 method for class 'feglm' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'feglm' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
an object of class |
digits |
unsigned integer indicating the number of decimal places. Default is
|
... |
other arguments. |
summary.APEs
print.summary.APEs
is a generic function which displays summary statistics from
objects returned by summary.APEs
.
## S3 method for class 'summary.APEs' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'summary.APEs' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
an object of class |
digits |
unsigned integer indicating the number of decimal places. Default is
|
... |
other arguments. |
summary.feglm
print.summary.feglm
is a generic function which displays summary statistics from
objects returned by summary.feglm
.
## S3 method for class 'summary.feglm' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'summary.feglm' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
an object of class |
digits |
unsigned integer indicating the number of decimal places. Default is
|
... |
other arguments. |
Constructs an artificial data set with cross-sectional units observed for
time
periods for logit, poisson, or gamma models. The “true” linear predictor
(
) is generated as follows:
where consists of three independent standard normally distributed regressors.
Both parameter referring to the unobserved heterogeneity (
and
) are generated as iid. standard normal and the structural parameters are
set to
.
Note: The poisson and gamma model are based on the logarithmic link function.
simGLM(n = NULL, t = NULL, seed = NULL, model = c("logit", "poisson", "gamma"))
simGLM(n = NULL, t = NULL, seed = NULL, model = c("logit", "poisson", "gamma"))
n |
a strictly positive integer equal to the number of cross-sectional units. |
t |
a strictly positive integer equal to the number of time periods. |
seed |
a seed to ensure reproducibility. |
model |
a string equal to |
The function simGLM
returns a data.frame with 6 variables.
APEs
Summary statistics for objects of class "APEs"
.
## S3 method for class 'APEs' summary(object, ...)
## S3 method for class 'APEs' summary(object, ...)
object |
an object of class |
... |
other arguments. |
Returns an object of class "summary.APEs"
which is a list of summary statistics of
object
.
feglm
Summary statistics for objects of class "feglm"
.
## S3 method for class 'feglm' summary( object, type = c("hessian", "outer.product", "sandwich", "clustered"), cluster = NULL, cluster.vars = NULL, ... )
## S3 method for class 'feglm' summary( object, type = c("hessian", "outer.product", "sandwich", "clustered"), cluster = NULL, cluster.vars = NULL, ... )
object |
an object of class |
type |
the type of covariance estimate required. |
cluster |
a symbolic description indicating the clustering of observations. |
cluster.vars |
deprecated; use |
... |
other arguments. |
Multi-way clustering is done using the algorithm of Cameron, Gelbach, and Miller (2011). An example is provided in the vignette "Replicating an Empirical Example of International Trade".
Returns an object of class "summary.feglm"
which is a list of summary statistics of
object
.
Cameron, C., J. Gelbach, and D. Miller (2011). "Robust Inference With Multiway Clustering". Journal of Business & Economic Statistics 29(2).
APEs
vcov.APEs
estimates the covariance matrix for the estimator of the
average partial effects from objects returned by getAPEs
.
## S3 method for class 'APEs' vcov(object, ...)
## S3 method for class 'APEs' vcov(object, ...)
object |
an object of class |
... |
other arguments. |
The function vcov.APEs
returns a named matrix of covariance estimates.
feglm
vcov.feglm
estimates the covariance matrix for the estimator of the
structural parameters from objects returned by feglm
. The covariance is computed
from the Hessian, the scores, or a combination of both after convergence.
## S3 method for class 'feglm' vcov( object, type = c("hessian", "outer.product", "sandwich", "clustered"), cluster = NULL, cluster.vars = NULL, ... )
## S3 method for class 'feglm' vcov( object, type = c("hessian", "outer.product", "sandwich", "clustered"), cluster = NULL, cluster.vars = NULL, ... )
object |
an object of class |
type |
the type of covariance estimate required. |
cluster |
a symbolic description indicating the clustering of observations. |
cluster.vars |
deprecated; use |
... |
other arguments. |
Multi-way clustering is done using the algorithm of Cameron, Gelbach, and Miller (2011). An example is provided in the vignette "Replicating an Empirical Example of International Trade".
The function vcov.feglm
returns a named matrix of covariance estimates.
Cameron, C., J. Gelbach, and D. Miller (2011). "Robust Inference With Multiway Clustering". Journal of Business & Economic Statistics 29(2).