broom
https://cran.r-project.org/web/packages/broom/broom.pdf
https://github.com/tidyverse/broom
- The output of the
tidy
,augment
andglance
functions is always a data frame. - The output never has rownames. This ensures that you can combine it with other tidy outputs without fear of losing information (since rownames in R cannot contain duplicates).
- Some column names are kept consistent, so that they can be combined across different models and so that you know what to expect (in contrast to asking “is it
pval
orPValue
?” every time)
tidy functions:
tidy
: constructs a data frame that summarizes the model’s statistical findings. This includes coefficients and p-values for each term in a regression, per-cluster information in clustering applications, or per-test information formulttest
functions.- Each row in a
tidy
output typically represents some well-defined concept, such as one term in a regression, one test, or one cluster/class. This meaning varies across models but is usually self-evident. The one thing each row cannot represent is a point in the initial data (for that, use theaugment
method). - Common column names include:
term
: the term in a regression or model that is being estimated.p.value
: this spelling was chosen (over common alternatives such aspvalue
,PValue
, orpval
) to be consistent with functions in R’s built-instats
packagestatistic
a test statistic, usually the one used to compute the p-value. Combining these across many sub-groups is a reliable way to perform (e.g.) bootstrap hypothesis testingestimate
estimate of an effect size, slope, or other valuestd.error
standard errorconf.low
the low end of a confidence interval on theestimate
conf.high
the high end of a confidence interval on theestimate
df
degrees of freedom
augment functions:
augment
: add columns to the original data that was modeled. This includes predictions, residuals, and cluster assignments.augment(model, data)
adds columns to the original data.- If the
data
argument is missing,augment
attempts to reconstruct the data from the model (note that this may not always be possible, and usually won’t contain columns not used in the model). - Each row in an
augment
output matches the corresponding row in the original data. - If the original data contained rownames,
augment
turns them into a column called.rownames
. - Newly added column names begin with
.
to avoid overwriting columns in the original data. - Common column names include:
.fitted
: the predicted values, on the same scale as the data..resid
: residuals: the actual y values minus the fitted values.cluster
: cluster assignments
glance functions:
glance
: construct a concise one-row summary of the model. This typically contains values such as R^2, adjusted R^2, and residual standard error that are computed once for the entire model.glance
always returns a one-row data frame.- The only exception is that
glance(NULL)
returns an empty data frame.
- The only exception is that
- We avoid including arguments that were given to the modeling function. For example, a
glm
glance output does not need to contain a field forfamily
, since that is decided by the user callingglm
rather than the modeling function itself. - Common column names include:
r.squared
the fraction of variance explained by the modeladj.r.squared
R^2 adjusted based on the degrees of freedomsigma
the square root of the estimated variance of the residuals