broom

 

broom

 

https://cran.r-project.org/web/packages/broom/broom.pdf

https://github.com/tidyverse/broom

 

  • The output of the tidyaugment and glance functions is always a data frame.
  • The output never has rownames. This ensures that you can combine it with other tidy outputs without fear of losing information (since rownames in R cannot contain duplicates).
  • Some column names are kept consistent, so that they can be combined across different models and so that you know what to expect (in contrast to asking “is it pval or PValue?” every time)

 

broom package makes models in r tidy

 broom tidy fitted values residuals models data fram in r tidy broom

 

tidy functions:

  • tidy: constructs a data frame that summarizes the model’s statistical findings. This includes coefficients and p-values for each term in a regression, per-cluster information in clustering applications, or per-test information for multtest functions.
  • Each row in a tidy output typically represents some well-defined concept, such as one term in a regression, one test, or one cluster/class. This meaning varies across models but is usually self-evident. The one thing each row cannot represent is a point in the initial data (for that, use the augment method).
  • Common column names include:
    • term: the term in a regression or model that is being estimated.
    • p.value: this spelling was chosen (over common alternatives such as pvaluePValue, or pval) to be consistent with functions in R’s built-in stats package
    • statistic a test statistic, usually the one used to compute the p-value. Combining these across many sub-groups is a reliable way to perform (e.g.) bootstrap hypothesis testing
    • estimate estimate of an effect size, slope, or other value
    • std.error standard error
    • conf.low the low end of a confidence interval on the estimate
    • conf.high the high end of a confidence interval on the estimate
    • df degrees of freedom

augment functions:

  • augment: add columns to the original data that was modeled. This includes predictions, residuals, and cluster assignments.
  • augment(model, data) adds columns to the original data.
  • If the data argument is missing, augment attempts to reconstruct the data from the model (note that this may not always be possible, and usually won’t contain columns not used in the model).
  • Each row in an augment output matches the corresponding row in the original data.
  • If the original data contained rownames, augment turns them into a column called .rownames.
  • Newly added column names begin with . to avoid overwriting columns in the original data.
  • Common column names include:
    • .fitted: the predicted values, on the same scale as the data.
    • .resid: residuals: the actual y values minus the fitted values
    • .cluster: cluster assignments

glance functions:

  • glance: construct a concise one-row summary of the model. This typically contains values such as R^2, adjusted R^2, and residual standard error that are computed once for the entire model.
  • glance always returns a one-row data frame.
    • The only exception is that glance(NULL) returns an empty data frame.
  • We avoid including arguments that were given to the modeling function. For example, a glm glance output does not need to contain a field for family, since that is decided by the user calling glm rather than the modeling function itself.
  • Common column names include:
    • r.squared the fraction of variance explained by the model
    • adj.r.squared R^2 adjusted based on the degrees of freedom
    • sigma the square root of the estimated variance of the residuals

 

r broom package nested data glance tidy augment

r package tidy models with broom package broom_tidy broom::glance broom::augment

 

 

broom package in r tidy model convert coef resid rsquareed kmeans to dataframes broom