Title: | Exact Variable-Subset Selection in Linear Regression |
---|---|
Description: | Exact and approximation algorithms for variable-subset selection in ordinary linear regression models. Either compute all submodels with the lowest residual sum of squares, or determine the single-best submodel according to a pre-determined statistical criterion. Hofmann et al. (2020) <10.18637/jss.v093.i03>. |
Authors: | Marc Hofmann [aut, cre], Cristian Gatu [aut], Erricos J. Kontoghiorghes [aut], Ana Colubi [aut], Achim Zeileis [aut] , Martin Moene [cph] (for the GSL Lite library), Microsoft Corporation [cph] (for the GSL Lite library), Free Software Foundation, Inc. [cph] (for snippets from the GNU ISO C++ Library) |
Maintainer: | Marc Hofmann <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.5-1 |
Built: | 2024-11-06 03:10:13 UTC |
Source: | https://github.com/marc-hofmann/lmsubsets.r |
Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.
data("AirPollution")
data("AirPollution")
A data frame containing 60 observations on 16 variables.
Average annual precipitation in inches.
Average January temperature in degrees Fahrenheit.
Average July temperature in degrees Fahrenheit.
Percentage of 1960 SMSA population aged 65 or older.
Average household size.
Median school years completed by those over 22.
Percentage of housing units which are sound and with all facilities.
Population per square mile in urbanized areas, 1960.
Percentage of non-Caucasian population in urbanized areas, 1960.
Percentage employed in white collar occupations.
Percentage of families with income < USD 3000.
Relative hydrocarbon pollution potential.
Relative nitric oxides potential.
Relative sulphur dioxide potential.
Annual average percentage of relative humidity at 13:00.
Total age-adjusted mortality rate per 100,000.
http://lib.stat.cmu.edu/datasets/pollution
McDonald GC, Schwing RC (1973). Instabilities of Regression Estimates Relating Air Pollution to Mortality. Technometrics, 15, 463–482.
Miller AJ (2002). Subset Selection in Regression. New York: Chapman and Hall.
## load data (with logs for relative potentials) data("AirPollution", package = "lmSubsets") for (i in 12:14) AirPollution[[i]] <- log(AirPollution[[i]]) ## fit subsets lm_all <- lmSubsets(mortality ~ ., data = AirPollution) plot(lm_all) ## refit best model lm6 <- refit(lm_all, size = 6) summary(lm6)
## load data (with logs for relative potentials) data("AirPollution", package = "lmSubsets") for (i in 12:14) AirPollution[[i]] <- log(AirPollution[[i]]) ## fit subsets lm_all <- lmSubsets(mortality ~ ., data = AirPollution) plot(lm_all) ## refit best model lm6 <- refit(lm_all, size = 6) summary(lm6)
00UTC temperature observations and corresponding 24-hour reforecast ensemble means from the Global Ensemble Forecast System (GEFS, Hamill et al. 2013) for SYNOP station Innsbruck Airport (11120; 47.260, 11.357) from 2011-01-01 to 2015-12-31.
data("IbkTemperature")
data("IbkTemperature")
A data frame containing 1824 daily observations/forecasts for 42
variables. The first column (temp
) contains temperature
observations at 00UTC (coordinated universal time), columns 2–37 are
24-hour lead time GEFS reforecast ensemble means for different
variables (see below). Columns 38–42 are deterministic time
trend/season patterns.
Observed temperature at Innsbruck Airport (deg ).
Total accumulated precipitation ().
Temperature at 2 meters ().
U-component of wind at 10 meters ().
V-component of wind at 10 meters ().
U-component of wind at 80 meters ().
U-component of wind at 80 meters ().
Convective available potential energy ().
Convective inhibition ().
Surface downward long-wave radiation flux ().
Surface downward short-wave radiation flux ().
Surface upward long-wave radiation flux ().
Surface upward short-wave radiation flux ().
Ground heat flux ().
Surface latent heat net flux ().
Surface sensible heat net flux ().
Mean sea level pressure ().
Surface pressure ().
Precipitable water ().
Volumetric soil moisture content (fraction).
Specific humidity at 2 meters ().
Total cloud cover (percent).
Total column-integrated condensate ().
Skin temperature ().
Maximum temperature ().
Minimum temperature ().
Soil temperature (0-10 cm below surface) ().
Upward long-wave radiation flux ().
Water runoff ().
Water equivalent of accumulated snow depth ().
Wind mixing energy ().
Vertical velocity at 850 hPa surface ().
Temperature on 2 PVU surface ().
Pressure on 2 PVU surface ().
U-component of wind on 2 PVU surface ().
U-component of wind on 2 PVU surface ().
Potential vorticity on 320 K isentrope ().
Time in years.
Sine and cosine component of annual harmonic pattern.
Sine and cosine component of bi-annual harmonic pattern.
Observations: http://www.ogimet.com/synops.phtml.en
Reforecasts: http://www.esrl.noaa.gov/psd/forecasts/reforecast2/
Hamill TM, Bates GT, Whitaker JS, Murray DR, Fiorino M, Galarneau Jr. TJ, Zhu Y, Lapenta W (2013). NOAA's Second-Generation Global Medium-Range Ensemble Reforecast Data Set. Bulletin of the American Meteorological Society, 94(10), 1553–1565. doi:10.1175/BAMS-D-12-00014.1
## load data and omit missing values data("IbkTemperature", package = "lmSubsets") IbkTemperature <- na.omit(IbkTemperature) ## fit a simple climatological model for the temperature ## with a linear trend and annual/bi-annual harmonic seasonal pattern CLIM <- lm(temp ~ time + sin + cos + sin2 + cos2, data = IbkTemperature) ## fit a simple MOS with 2-meter temperature forecast in addition ## to the climatological model MOS0 <- lm(temp ~ t2m + time + sin + cos + sin2 + cos2, data = IbkTemperature) ## graphical comparison and MOS summary plot(temp ~ time, data = IbkTemperature, type = "l", col = "darkgray") lines(fitted(MOS0) ~ time, data = IbkTemperature, col = "darkred") lines(fitted(CLIM) ~ time, data = IbkTemperature, lwd = 2) MOS0 ## best subset selection of remaining variables for the MOS ## (i.e., forcing the regressors of m1 into the model) MOS1_all <- lmSubsets(temp ~ ., data = IbkTemperature, include = c("t2m", "time", "sin", "cos", "sin2", "cos2")) plot(MOS1_all) image(MOS1_all, size = 8:20) ## -> Note that soil temperature and maximum temperature are selected ## in addition to the 2-meter temperature ## best subset selection of all variables MOS2_all <- lmSubsets(temp ~ ., data = IbkTemperature) plot(MOS2_all) image(MOS2_all, size = 2:20) ## -> Note that 2-meter temperature is not selected into the best ## BIC model but soil-temperature (and maximum temperature) are used instead ## refit the best BIC subset selections MOS1 <- refit(lmSelect(MOS1_all)) MOS2 <- refit(lmSelect(MOS2_all)) ## compare BIC BIC(CLIM, MOS0, MOS1, MOS2) ## compare RMSE sqrt(sapply(list(CLIM, MOS0, MOS1, MOS2), deviance)/ nrow(IbkTemperature)) ## compare coefficients cf0 <- coef(CLIM) cf1 <- coef(MOS0) cf2 <- coef(MOS1) cf3 <- coef(MOS2) names(cf2) <- gsub("^x", "", names(coef(MOS1))) names(cf3) <- gsub("^x", "", names(coef(MOS2))) nam <- unique(c(names(cf0), names(cf1), names(cf2), names(cf3))) cf <- matrix(NA, nrow = length(nam), ncol = 4, dimnames = list(nam, c("CLIM", "MOS0", "MOS1", "MOS2"))) cf[names(cf0), 1] <- cf0 cf[names(cf1), 2] <- cf1 cf[names(cf2), 3] <- cf2 cf[names(cf3), 4] <- cf3 print(round(cf, digits = 3), na.print = "")
## load data and omit missing values data("IbkTemperature", package = "lmSubsets") IbkTemperature <- na.omit(IbkTemperature) ## fit a simple climatological model for the temperature ## with a linear trend and annual/bi-annual harmonic seasonal pattern CLIM <- lm(temp ~ time + sin + cos + sin2 + cos2, data = IbkTemperature) ## fit a simple MOS with 2-meter temperature forecast in addition ## to the climatological model MOS0 <- lm(temp ~ t2m + time + sin + cos + sin2 + cos2, data = IbkTemperature) ## graphical comparison and MOS summary plot(temp ~ time, data = IbkTemperature, type = "l", col = "darkgray") lines(fitted(MOS0) ~ time, data = IbkTemperature, col = "darkred") lines(fitted(CLIM) ~ time, data = IbkTemperature, lwd = 2) MOS0 ## best subset selection of remaining variables for the MOS ## (i.e., forcing the regressors of m1 into the model) MOS1_all <- lmSubsets(temp ~ ., data = IbkTemperature, include = c("t2m", "time", "sin", "cos", "sin2", "cos2")) plot(MOS1_all) image(MOS1_all, size = 8:20) ## -> Note that soil temperature and maximum temperature are selected ## in addition to the 2-meter temperature ## best subset selection of all variables MOS2_all <- lmSubsets(temp ~ ., data = IbkTemperature) plot(MOS2_all) image(MOS2_all, size = 2:20) ## -> Note that 2-meter temperature is not selected into the best ## BIC model but soil-temperature (and maximum temperature) are used instead ## refit the best BIC subset selections MOS1 <- refit(lmSelect(MOS1_all)) MOS2 <- refit(lmSelect(MOS2_all)) ## compare BIC BIC(CLIM, MOS0, MOS1, MOS2) ## compare RMSE sqrt(sapply(list(CLIM, MOS0, MOS1, MOS2), deviance)/ nrow(IbkTemperature)) ## compare coefficients cf0 <- coef(CLIM) cf1 <- coef(MOS0) cf2 <- coef(MOS1) cf3 <- coef(MOS2) names(cf2) <- gsub("^x", "", names(coef(MOS1))) names(cf3) <- gsub("^x", "", names(coef(MOS2))) nam <- unique(c(names(cf0), names(cf1), names(cf2), names(cf3))) cf <- matrix(NA, nrow = length(nam), ncol = 4, dimnames = list(nam, c("CLIM", "MOS0", "MOS1", "MOS2"))) cf[names(cf0), 1] <- cf0 cf[names(cf1), 2] <- cf1 cf[names(cf2), 3] <- cf2 cf[names(cf3), 4] <- cf3 print(round(cf, digits = 3), na.print = "")
Visualization of variable subsets.
## S3 method for class 'lmSubsets' image(x, size = NULL, best = 1, which = NULL, hilite, hilite_penalty, main, sub, xlab = NULL, ylab, ann = par("ann"), axes = TRUE, col = c("gray40", "gray90"), lab = "lab", col_hilite = cbind("red", "pink"), lab_hilite = "lab", pad_size = 3, pad_best = 1, pad_which = 3, axis_pos = -4, axis_tck = -4, axis_lab = -10, ...) ## S3 method for class 'lmSelect' image(x, best = NULL, which = NULL, hilite, hilite_penalty, main, sub = NULL, xlab = NULL, ylab, ann = par("ann"), axes = TRUE, col = c("gray40", "gray90"), lab = "lab", col_hilite = cbind("red", "pink"), lab_hilite = "lab", pad_best = 2, pad_which = 2, axis_pos = -4, axis_tck = -4, axis_lab = -10, ...)
## S3 method for class 'lmSubsets' image(x, size = NULL, best = 1, which = NULL, hilite, hilite_penalty, main, sub, xlab = NULL, ylab, ann = par("ann"), axes = TRUE, col = c("gray40", "gray90"), lab = "lab", col_hilite = cbind("red", "pink"), lab_hilite = "lab", pad_size = 3, pad_best = 1, pad_which = 3, axis_pos = -4, axis_tck = -4, axis_lab = -10, ...) ## S3 method for class 'lmSelect' image(x, best = NULL, which = NULL, hilite, hilite_penalty, main, sub = NULL, xlab = NULL, ylab, ann = par("ann"), axes = TRUE, col = c("gray40", "gray90"), lab = "lab", col_hilite = cbind("red", "pink"), lab_hilite = "lab", pad_best = 2, pad_which = 2, axis_pos = -4, axis_tck = -4, axis_lab = -10, ...)
x |
An object of class |
main , sub , xlab , ylab
|
Main, sub- and axis titles. |
size , best
|
Submodels to be plotted. |
which |
Regressors to be plotted. |
col , lab
|
Color and label style. |
hilite , hilite_penalty
|
Submodels to be highlighted. |
col_hilite , lab_hilite
|
Highlighting style. |
pad_size , pad_best , pad_which
|
Padding. |
axis_pos , axis_tck , axis_lab
|
Position of axes, tick length, and position of labels |
... |
Ignored. |
axes |
Plot axes. |
ann |
Annotate plot. |
## data data("AirPollution", package = "lmSubsets") ################# ## lmSubsets ## ################# lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 20) ## heatmap image(lm_all, best = 1:3) ## highlight 5 best (BIC) image(lm_all, best = 1:3, hilite = 1:5, hilite_penalty = "BIC") ################ ## lmSelect ## ################ ## default criterion: BIC lm_best <- lmSelect(lm_all) ## highlight 5 best (AIC) image(lm_best, hilite = 1:5, hilite_penalty = "AIC") ## axis labels image(lm_best, lab = c("bold(lab)", "lab"), hilite = 1, lab_hilite = "underline(lab)")
## data data("AirPollution", package = "lmSubsets") ################# ## lmSubsets ## ################# lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 20) ## heatmap image(lm_all, best = 1:3) ## highlight 5 best (BIC) image(lm_all, best = 1:3, hilite = 1:5, hilite_penalty = "BIC") ################ ## lmSelect ## ################ ## default criterion: BIC lm_best <- lmSelect(lm_all) ## highlight 5 best (AIC) image(lm_best, hilite = 1:5, hilite_penalty = "AIC") ## axis labels image(lm_best, lab = c("bold(lab)", "lab"), hilite = 1, lab_hilite = "underline(lab)")
Best-subset regression for ordinary linear models.
lmSelect(formula, ...) ## Default S3 method: lmSelect(formula, data, subset, weights, na.action, model = TRUE, x = FALSE, y = FALSE, contrasts = NULL, offset, ...) ## S3 method for class 'matrix' lmSelect(formula, y, intercept = TRUE, ...) ## S3 method for class 'lmSubsets' lmSelect(formula, penalty = "BIC", ...) lmSelect_fit(x, y, weights = NULL, offset = NULL, include = NULL, exclude = NULL, penalty = "BIC", tolerance = 0, nbest = 1, ..., pradius = NULL)
lmSelect(formula, ...) ## Default S3 method: lmSelect(formula, data, subset, weights, na.action, model = TRUE, x = FALSE, y = FALSE, contrasts = NULL, offset, ...) ## S3 method for class 'matrix' lmSelect(formula, y, intercept = TRUE, ...) ## S3 method for class 'lmSubsets' lmSelect(formula, penalty = "BIC", ...) lmSelect_fit(x, y, weights = NULL, offset = NULL, include = NULL, exclude = NULL, penalty = "BIC", tolerance = 0, nbest = 1, ..., pradius = NULL)
formula , data , subset , weights , na.action , model , x , y , contrasts , offset
|
Standard formula interface. |
intercept |
Include intercept. |
include , exclude
|
Force regressors in or out. |
penalty |
Penalty per parameter. |
tolerance |
Approximation tolerance. |
nbest |
Number of best subsets. |
... |
Forwarded to |
pradius |
Preordering radius. |
The lmSelect
generic provides a convenient interface for best
variable-subset selection in linear regression: The nbest
best
– according to an information criterion of the AIC family – subset
models are returned.
The information criterion is specified with the penalty
parameter. Accepted values are "AIC"
, "BIC"
, or a
numeric
value representing the penalty per model parameter (see
AIC
).
A custom selection criterion may be specified by passing an R function
as the penalty
argument. The expected signature is
function(size, rss)
, where size
is the number of
predictors (including intercept, if any), and rss
the residual
sum of squares. The function must be non-decreasing in both
parameters.
A low-level matrix interface is provided by lmSelect_fit
.
See lmSubsets
for further information.
An object of class "lmSelect"
, i.e., a list with the following
components:
nobs , nvar
|
Number of observations, of variables. |
intercept |
|
include , exclude
|
Included, excluded variables. |
size |
Subset sizes. |
tolerance |
Approximation tolerance. |
nbest |
Number of best subsets. |
submodel |
Submodel information. |
subset |
Selected variables. |
Further components include call
, na.action
,
weights
, offset
, contrasts
, xlevels
,
terms
, mf
, x
, and y
. See
lm
for more information.
Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact Variable-Subset Selection in Linear Regression for R. Journal of Statistical Software. 93, 1–21. doi:10.18637/jss.v093.i03.
## load data (with logs for relative potentials) data("AirPollution", package = "lmSubsets") ################### ## basic usage ## ################### ## fit 20 best subsets (BIC) lm_best <- lmSelect(mortality ~ ., data = AirPollution, nbest = 20) lm_best ## equivalent to: ## Not run: lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 20) lm_best <- lmSelect(lm_all) ## End(Not run) ## summary statistics summary(lm_best) ## visualize plot(lm_best) ######################## ## custom criterion ## ######################## ## the same as above, but with a custom criterion: M <- nrow(AirPollution) ll <- function (rss) { -M/2 * (log(2 * pi) - log(M) + log(rss) + 1) } aic <- function (size, rss, k = 2) { -2 * ll(rss) + k * (size + 1) } bic <- function (size, rss) { aic(size, rss, k = log(M)) } lm_cust <- lmSelect(mortality ~ ., data = AirPollution, penalty = bic, nbest = 20) lm_cust
## load data (with logs for relative potentials) data("AirPollution", package = "lmSubsets") ################### ## basic usage ## ################### ## fit 20 best subsets (BIC) lm_best <- lmSelect(mortality ~ ., data = AirPollution, nbest = 20) lm_best ## equivalent to: ## Not run: lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 20) lm_best <- lmSelect(lm_all) ## End(Not run) ## summary statistics summary(lm_best) ## visualize plot(lm_best) ######################## ## custom criterion ## ######################## ## the same as above, but with a custom criterion: M <- nrow(AirPollution) ll <- function (rss) { -M/2 * (log(2 * pi) - log(M) + log(rss) + 1) } aic <- function (size, rss, k = 2) { -2 * ll(rss) + k * (size + 1) } bic <- function (size, rss) { aic(size, rss, k = log(M)) } lm_cust <- lmSelect(mortality ~ ., data = AirPollution, penalty = bic, nbest = 20) lm_cust
All-subsets regression for linear models estimated by ordinary least squares (OLS).
lmSubsets(formula, ...) ## Default S3 method: lmSubsets(formula, data, subset, weights, na.action, model = TRUE, x = FALSE, y = FALSE, contrasts = NULL, offset, ...) ## S3 method for class 'matrix' lmSubsets(formula, y, intercept = TRUE, ...) lmSubsets_fit(x, y, weights = NULL, offset = NULL, include = NULL, exclude = NULL, nmin = NULL, nmax = NULL, tolerance = 0, nbest = 1, ..., pradius = NULL)
lmSubsets(formula, ...) ## Default S3 method: lmSubsets(formula, data, subset, weights, na.action, model = TRUE, x = FALSE, y = FALSE, contrasts = NULL, offset, ...) ## S3 method for class 'matrix' lmSubsets(formula, y, intercept = TRUE, ...) lmSubsets_fit(x, y, weights = NULL, offset = NULL, include = NULL, exclude = NULL, nmin = NULL, nmax = NULL, tolerance = 0, nbest = 1, ..., pradius = NULL)
formula , data , subset , weights , na.action , model , x , y , contrasts , offset
|
Standard formula interface. |
intercept |
Include intercept. |
include , exclude
|
Force regressors in or out. |
nmin , nmax
|
Minimum and maximum number of regressors. |
tolerance |
Approximation tolerance (vector). |
nbest |
Number of best subsets. |
... |
Forwarded to |
pradius |
Preordering radius. |
The lmSubsets
generic provides various methods to conveniently
specify the regressor and response variables. The standard
formula
interface (see lm
) can be used, or
the information can be extracted from an already fitted "lm"
object. The regressor matrix and response variable can also be passed
in directly (see Examples).
The call is forwarded to lmSubsets_fit
, which provides a
low-level matrix interface.
The nbest
best subset models for every subset size are
computed, where the "best" models are the models with the lowest
residual sum of squares (RSS). The scope of the search can be limited
to a range of subset sizes by setting nmin
and nmax
. A
tolerance vector (expanded if necessary) may be specified to speed up
the search, where tolerance[j]
is the tolerance applied to
subset models of size j
.
By way of include
and exclude
, variables may be forced
in to or out of the regression, respectively.
The extent to which variables are preordered is controlled with the
pradius
parameter.
A set of standard extractor functions for fitted model objects is
available for objects of class "lmSubsets"
. See
methods
for more details.
The summary
method can be called to obtain summary statistics.
An object of class "lmSubsets"
, i.e., a list with the
following components:
nobs , nvar
|
Number of observations, of variables. |
intercept |
|
include , exclude
|
Included, excluded regressors. |
size |
Subset sizes. |
tolerance |
Approximation tolerance (vector). |
nbest |
Number of best subsets. |
submodel |
Submodel information. |
subset |
Selected variables. |
Further components include call
, na.action
,
weights
, offset
, contrasts
, xlevels
,
terms
, mf
, x
, and y
. See
lm
for more information.
Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact Variable-Subset Selection in Linear Regression for R. Journal of Statistical Software. 93, 1–21. doi:10.18637/jss.v093.i03.
Hofmann M, Gatu C, Kontoghiorghes EJ (2007). Efficient Algorithms for Computing the Best Subset Regression Models for Large-Scale Problems. Computational Statistics \& Data Analysis, 52, 16–29. doi:10.1016/j.csda.2007.03.017.
Gatu C, Kontoghiorghes EJ (2006). Branch-and-Bound Algorithms for Computing the Best Subset Regression Models. Journal of Computational and Graphical Statistics, 15, 139–156. doi:10.1198/106186006x100290.
## load data data("AirPollution", package = "lmSubsets") ################### ## basic usage ## ################### ## canonical example: fit all subsets lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 5) lm_all ## plot RSS and BIC plot(lm_all) ## summary statistics summary(lm_all) ############################ ## forced in-/exclusion ## ############################ lm_force <- lmSubsets(lm_all, include = c("nox", "so2"), exclude = "whitecollar") lm_force ######################## ## matrix interface ## ######################## ## same as above x <- as.matrix(AirPollution) lm_mat <- lmSubsets(x, y = "mortality") lm_mat
## load data data("AirPollution", package = "lmSubsets") ################### ## basic usage ## ################### ## canonical example: fit all subsets lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 5) lm_all ## plot RSS and BIC plot(lm_all) ## summary statistics summary(lm_all) ############################ ## forced in-/exclusion ## ############################ lm_force <- lmSubsets(lm_all, include = c("nox", "so2"), exclude = "whitecollar") lm_force ######################## ## matrix interface ## ######################## ## same as above x <- as.matrix(AirPollution) lm_mat <- lmSubsets(x, y = "mortality") lm_mat
Extractor methods for "lmSubsets"
and "lmSelect"
objects.
## S3 method for class 'lmSubsets' variable.names(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' variable.names(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' formula(x, size, best = 1, ...) ## S3 method for class 'lmSelect' formula(x, best, ...) ## S3 method for class 'lmSubsets' model.frame(formula, ...) ## S3 method for class 'lmSelect' model.frame(formula, ...) ## S3 method for class 'lmSubsets' model.matrix(object, size, best = 1, ...) ## S3 method for class 'lmSelect' model.matrix(object, best, ...) ## S3 method for class 'lmSubsets' model_response(data, ...) ## S3 method for class 'lmSelect' model_response(data, ...) ## S3 method for class 'lmSubsets' refit(object, size, best = 1, ...) ## S3 method for class 'lmSelect' refit(object, best = 1, ...) ## S3 method for class 'lmSubsets' deviance(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' deviance(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' sigma(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' sigma(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' logLik(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' logLik(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' AIC(object, size, best = 1, ..., k = 2, na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' AIC(object, best = 1, ..., k = 2, na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' BIC(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' BIC(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' coef(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' coef(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' vcov(object, size, best = 1, ...) ## S3 method for class 'lmSelect' vcov(object, best = 1, ...) ## S3 method for class 'lmSubsets' fitted(object, size, best = 1, ...) ## S3 method for class 'lmSelect' fitted(object, best = 1, ...) ## S3 method for class 'lmSubsets' residuals(object, size, best = 1, ...) ## S3 method for class 'lmSelect' residuals(object, best = 1, ...)
## S3 method for class 'lmSubsets' variable.names(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' variable.names(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' formula(x, size, best = 1, ...) ## S3 method for class 'lmSelect' formula(x, best, ...) ## S3 method for class 'lmSubsets' model.frame(formula, ...) ## S3 method for class 'lmSelect' model.frame(formula, ...) ## S3 method for class 'lmSubsets' model.matrix(object, size, best = 1, ...) ## S3 method for class 'lmSelect' model.matrix(object, best, ...) ## S3 method for class 'lmSubsets' model_response(data, ...) ## S3 method for class 'lmSelect' model_response(data, ...) ## S3 method for class 'lmSubsets' refit(object, size, best = 1, ...) ## S3 method for class 'lmSelect' refit(object, best = 1, ...) ## S3 method for class 'lmSubsets' deviance(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' deviance(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' sigma(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' sigma(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' logLik(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' logLik(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' AIC(object, size, best = 1, ..., k = 2, na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' AIC(object, best = 1, ..., k = 2, na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' BIC(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' BIC(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' coef(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSelect' coef(object, best = 1, ..., na.rm = TRUE, drop = TRUE) ## S3 method for class 'lmSubsets' vcov(object, size, best = 1, ...) ## S3 method for class 'lmSelect' vcov(object, best = 1, ...) ## S3 method for class 'lmSubsets' fitted(object, size, best = 1, ...) ## S3 method for class 'lmSelect' fitted(object, best = 1, ...) ## S3 method for class 'lmSubsets' residuals(object, size, best = 1, ...) ## S3 method for class 'lmSelect' residuals(object, best = 1, ...)
object , formula , data , x
|
An object of class |
size |
The submodel size. |
best |
The submodel ranking. |
... |
Forwarded arguments. |
k |
AIC penalty. |
drop |
Control shape of return value. |
na.rm |
Remove missing submodels. |
The extractor methods work for "lmSubsets"
and
"lmSelect"
objects, i.e., objects that have been generated
using the formula interface.
If drop == FALSE
, the extractor methods variable.names
,
deviance
, sigma
, logLik
, AIC
, BIC
and coef
return a data.frame
object. If drop ==
TRUE
, the return value is a logical
matrix
(variable.names
), a numeric
matrix (coef
), or a
numeric
vector. If the drop
parameter is not set
explicitly when calling variable.names
or coef
,
one-dimensional values are represented in a compact form.
If a desired extractor function is not available, refit
can be
called explicitly to obtain the corresponding "lm"
object.
## load data data("AirPollution", package = "lmSubsets") ## fit subsets (5 best subsets per size) lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 5) ## extract information (for best submodel of size 3) coef(lm_all, size = 3) vcov(lm_all, size = 3) residuals(lm_all, size = 3) fitted(lm_all, size = 3) model.matrix(lm_all, size = 3) ## select best (BIC) submodels lm_best <- lmSelect(lm_all) ## extract information deviance(lm_best) logLik(lm_best) AIC(lm_best) BIC(lm_best, best = 1:5) ## refit model lm5 <- refit(lm_all, size = 5) summary(lm5) ## (Note that the p-values are not valid due to model selection.)
## load data data("AirPollution", package = "lmSubsets") ## fit subsets (5 best subsets per size) lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 5) ## extract information (for best submodel of size 3) coef(lm_all, size = 3) vcov(lm_all, size = 3) residuals(lm_all, size = 3) fitted(lm_all, size = 3) model.matrix(lm_all, size = 3) ## select best (BIC) submodels lm_best <- lmSelect(lm_all) ## extract information deviance(lm_best) logLik(lm_best) AIC(lm_best) BIC(lm_best, best = 1:5) ## refit model lm5 <- refit(lm_all, size = 5) summary(lm5) ## (Note that the p-values are not valid due to model selection.)
Generic function for refitting a model on various subsets or reweighted data sets.
refit(object, ...)
refit(object, ...)
object |
An object. |
... |
Forwarded arguments. |
The refit
generic is a new function for refitting a certain
model object on multiple versions of a data set (and is hence
different from update
). Applications refit models after some
kind of model selection, e.g., variable subset selection,
partitioning, reweighting etc.
The generic is similar to the one provided in modeltools and fxregime (and should fulfill the same purpose). To avoid dependencies, it is also provided here.