Fix, document and export label functions
This PR:
- Make
get_variable_labels_modelswork also if a single model, rather than a list of models, is passed - Fix a bug were
get_variable_labels_datawould drop variables that do not have a label defined - Document those functions
- Export those functions. I believe they are useful to build a starting point for a vector to be passed to
coef_renameorcoef_map
Thanks!
Could you give me an example of failure that this fixes?
For now, I don't think I want to export these functions, because it is very important to me that we keep the number of exported functions in the modelsummary namespace to an absolute minimum.
Hello!
Could you give me an example of failure that this fixes?
The one in the example section:
dat <- mtcars
dat %<>%
labelled::set_variable_labels(
mpg = "Miles/(US) gallon",
cyl = "Number of cylinders",
)
dat$gear %<>% factor()
mod <- lm(mpg ~ cyl + disp, dat)
modelsummary::get_variable_labels_models(mod)
With the current code disp is dropped
My objective would be to re-code the modelsummary::coef_rename function to support labels
For the moment I use my own function because I think that the function called in modelsummary::modelsummary(coef_rename=myfunction) takes a character vector?
So noway I can have labels there
#' Build a named character vector mapping variable names to variable labels, including factor levels
#' @export
modelsummary_build_labelled_coef_map <- function(df, debug=FALSE) {
# Get named character vector of variable names 2 variable labels
coefmap <- modelsummary::get_variable_labels_models(df)
# For variables with missing labels, use variable name as label.
# Otherwise modelsummary will not print the variable at all.
for(n in names(coefmap)) {
if(is.null(coefmap[[n]])) {
coefmap[[n]] <- n
if(debug) {
print(paste0("Missing label for variable ", n, ", using variable name."))
}
}
}
# Convert factor variables from `nameLevel` into `label [level]`
for (var in colnames(df)) {
cc <- class(df[[var]])
if(length(cc) != 1) {
stop("ERROR: Variable ", var, " has multiple classes: ", paste(cc, collapse = ", "))
}
if (cc == "factor") {
if(debug) {
print(paste0("Converting factor variable: ", var))
}
for (lvl in levels(df[[var]])) {
varinst_name <- paste0(var, lvl)
var_label <- coefmap[[var]]
varinst_label <- paste0(var_label, " [", lvl, "]")
coefmap[[varinst_name]] <- varinst_label
if(debug) {
print(paste0(" - Added: ", varinst_name, " -> ", varinst_label))
}
}
}
}
return(coefmap)
}
That sounds like a good idea. As long as (1) we work within the currently exported function, (2) don't break any existing functionality, and (3) don't change the default behavior of anything.
That sounds like a good idea.
What? My function or the fix?
My function needs the fixes I propose here so not to drop variables that don't have an associated label.
For my function, I don't think it is currently possible to pass it to the coef_rename parameter of modelsummary because it is my understanding that the function passed there is called with a character vector of variables, rather than the model
I’m starting to think that modelsummary should take full charge of label handling instead of delegating it to parameters.
It would be pretty easy to improve coef_rename to do something like this:
library(modelsummary)
library(labelled)
coef_rename <- function(
x,
factor = TRUE,
factor_name = TRUE,
poly = TRUE,
backticks = TRUE,
titlecase = TRUE,
underscore = TRUE,
asis = TRUE,
data = NULL) {
# if `data` is supplied, it means that the user wants to use variable labels
# it also means the user already executed `coef_rename(data=dat)`, whereas
# `modelsummary` expects an unevaluated function there. So we need to return
# a function rather than a character vector
if (missing(x) && !is.null(data)) {
labels <- list()
for (nm in names(data)) {
labels[[nm]] <- attr(data[[nm]], "label")
}
fun <- function(x) {
for (l in names(labels)) {
x <- gsub(l, labels[[l]], x, fixed = TRUE)
}
return(x)
}
return(fun)
}
# if `data` is not supplied, it means that the user supplied an unevaluated
# function to `modelsummary`, which means we need to process `x` directly.
out <- x
if (isTRUE(factor_name)) {
out <- gsub("factor\\(.*\\)", "", out)
} else if (isTRUE(factor)) {
out <- gsub("factor\\((.*)\\)", "\\1 ", out)
}
if (isTRUE(poly)) {
out <- gsub("poly\\(([^,]*),.*\\)", "\\1^", out)
}
if (isTRUE(underscore)) {
out <- gsub("_", " ", out)
}
if (isTRUE(backticks)) {
out <- gsub("\\`", "", out)
}
if (isTRUE(asis)) {
out <- gsub("\\bI\\((.*)\\)", "(\\1)", out)
}
if (isTRUE(titlecase)) {
out <- tools::toTitleCase(out)
}
return(out)
}
dat <- iris
var_label(dat) <- list(
Sepal.Length = "Sepal Length (cm)",
Sepal.Width = "Sepal Width (cm)"
)
mod <- lm(Petal.Length ~ Sepal.Length + Sepal.Width, data = dat)
cr <- coef_rename(data = dat)
modelsummary(mod, coef_rename = cr) |> print("markdown")
+-------------------+----------+
| | (1) |
+===================+==========+
| (Intercept) | -2.525 |
+-------------------+----------+
| | (0.563) |
+-------------------+----------+
| Sepal Length (cm) | 1.776 |
+-------------------+----------+
| | (0.064) |
+-------------------+----------+
| Sepal Width (cm) | -1.339 |
+-------------------+----------+
| | (0.122) |
+-------------------+----------+
| Num.Obs. | 150 |
+-------------------+----------+
| R2 | 0.868 |
+-------------------+----------+
| R2 Adj. | 0.866 |
+-------------------+----------+
| AIC | 299.8 |
+-------------------+----------+
| BIC | 311.8 |
+-------------------+----------+
| Log.Lik. | -145.894 |
+-------------------+----------+
| F | 481.997 |
+-------------------+----------+
| RMSE | 0.64 |
+-------------------+----------+
I re-submitted the patch without exporting the function, just the fix, but I'm not sure at this point what actual bug with exported functions it would fix.
I thought it would be dvnames but in reality it has code to handle missing variable in the map.
I’m starting to think that modelsummary should take full charge of label handling instead of delegating it to parameters.
I think this would re-invent the wheel
I think this would re-invent the wheel
OK, fair. Let's not do that.
So what are we trying to achieve with this PR if we don't know what bug we're fixing?
Also, some checks are failing, apparently.