modelsummary icon indicating copy to clipboard operation
modelsummary copied to clipboard

Fix, document and export label functions

Open raffaem opened this issue 3 months ago • 9 comments

This PR:

  • Make get_variable_labels_models work also if a single model, rather than a list of models, is passed
  • Fix a bug were get_variable_labels_data would drop variables that do not have a label defined
  • Document those functions
  • Export those functions. I believe they are useful to build a starting point for a vector to be passed to coef_rename or coef_map

raffaem avatar Aug 29 '25 10:08 raffaem

Thanks!

Could you give me an example of failure that this fixes?

For now, I don't think I want to export these functions, because it is very important to me that we keep the number of exported functions in the modelsummary namespace to an absolute minimum.

vincentarelbundock avatar Aug 29 '25 11:08 vincentarelbundock

Hello!

Could you give me an example of failure that this fixes?

The one in the example section:

dat <- mtcars
dat %<>%
  labelled::set_variable_labels(
    mpg = "Miles/(US) gallon",
    cyl = "Number of cylinders",
  )
dat$gear %<>% factor()
mod <- lm(mpg ~ cyl + disp, dat)
modelsummary::get_variable_labels_models(mod)

With the current code disp is dropped

raffaem avatar Aug 29 '25 11:08 raffaem

My objective would be to re-code the modelsummary::coef_rename function to support labels

For the moment I use my own function because I think that the function called in modelsummary::modelsummary(coef_rename=myfunction) takes a character vector? So noway I can have labels there

#' Build a named character vector mapping variable names to variable labels, including factor levels
#' @export
modelsummary_build_labelled_coef_map <- function(df, debug=FALSE) {
  # Get named character vector of variable names 2 variable labels
  coefmap <- modelsummary::get_variable_labels_models(df)
  # For variables with missing labels, use variable name as label.
  # Otherwise modelsummary will not print the variable at all.
  for(n in names(coefmap)) {
    if(is.null(coefmap[[n]])) {
      coefmap[[n]] <- n
      if(debug) {
        print(paste0("Missing label for variable ", n, ", using variable name."))
      }
    }
  }
  # Convert factor variables from `nameLevel` into `label [level]`
  for (var in colnames(df)) {
    cc <- class(df[[var]])
    if(length(cc) != 1) {
      stop("ERROR: Variable ", var, " has multiple classes: ", paste(cc, collapse = ", "))
    }
    if (cc == "factor") {
      if(debug) {
        print(paste0("Converting factor variable: ", var))
      }
      for (lvl in levels(df[[var]])) {
        varinst_name <- paste0(var, lvl)
        var_label <- coefmap[[var]]
        varinst_label <- paste0(var_label, " [", lvl, "]")
        coefmap[[varinst_name]] <- varinst_label
        if(debug) {
          print(paste0(" - Added: ", varinst_name, " -> ", varinst_label))
        }
      }
    }
  }
  return(coefmap)
}

raffaem avatar Aug 29 '25 11:08 raffaem

That sounds like a good idea. As long as (1) we work within the currently exported function, (2) don't break any existing functionality, and (3) don't change the default behavior of anything.

vincentarelbundock avatar Aug 29 '25 11:08 vincentarelbundock

That sounds like a good idea.

What? My function or the fix?

My function needs the fixes I propose here so not to drop variables that don't have an associated label.

For my function, I don't think it is currently possible to pass it to the coef_rename parameter of modelsummary because it is my understanding that the function passed there is called with a character vector of variables, rather than the model

raffaem avatar Aug 29 '25 12:08 raffaem

I’m starting to think that modelsummary should take full charge of label handling instead of delegating it to parameters.

It would be pretty easy to improve coef_rename to do something like this:

library(modelsummary)
library(labelled)

coef_rename <- function(
    x,
    factor = TRUE,
    factor_name = TRUE,
    poly = TRUE,
    backticks = TRUE,
    titlecase = TRUE,
    underscore = TRUE,
    asis = TRUE,
    data = NULL) {
    # if `data` is supplied, it means that the user wants to use variable labels
    # it also means the user already executed `coef_rename(data=dat)`, whereas
    # `modelsummary` expects an unevaluated function there. So we need to return
    # a function rather than a character vector
    if (missing(x) && !is.null(data)) {
        labels <- list()
        for (nm in names(data)) {
            labels[[nm]] <- attr(data[[nm]], "label")
        }
        fun <- function(x) {
            for (l in names(labels)) {
                x <- gsub(l, labels[[l]], x, fixed = TRUE)
            }
            return(x)
        }
        return(fun)
    }

    # if `data` is not supplied, it means that the user supplied an unevaluated
    # function to `modelsummary`, which means we need to process `x` directly.
    out <- x

    if (isTRUE(factor_name)) {
        out <- gsub("factor\\(.*\\)", "", out)
    } else if (isTRUE(factor)) {
        out <- gsub("factor\\((.*)\\)", "\\1 ", out)
    }

    if (isTRUE(poly)) {
        out <- gsub("poly\\(([^,]*),.*\\)", "\\1^", out)
    }

    if (isTRUE(underscore)) {
        out <- gsub("_", " ", out)
    }

    if (isTRUE(backticks)) {
        out <- gsub("\\`", "", out)
    }

    if (isTRUE(asis)) {
        out <- gsub("\\bI\\((.*)\\)", "(\\1)", out)
    }

    if (isTRUE(titlecase)) {
        out <- tools::toTitleCase(out)
    }

    return(out)
}


dat <- iris
var_label(dat) <- list(
    Sepal.Length = "Sepal Length (cm)",
    Sepal.Width = "Sepal Width (cm)"
)

mod <- lm(Petal.Length ~ Sepal.Length + Sepal.Width, data = dat)

cr <- coef_rename(data = dat)

modelsummary(mod, coef_rename = cr) |> print("markdown")
+-------------------+----------+
|                   | (1)      |
+===================+==========+
| (Intercept)       | -2.525   |
+-------------------+----------+
|                   | (0.563)  |
+-------------------+----------+
| Sepal Length (cm) | 1.776    |
+-------------------+----------+
|                   | (0.064)  |
+-------------------+----------+
| Sepal Width (cm)  | -1.339   |
+-------------------+----------+
|                   | (0.122)  |
+-------------------+----------+
| Num.Obs.          | 150      |
+-------------------+----------+
| R2                | 0.868    |
+-------------------+----------+
| R2 Adj.           | 0.866    |
+-------------------+----------+
| AIC               | 299.8    |
+-------------------+----------+
| BIC               | 311.8    |
+-------------------+----------+
| Log.Lik.          | -145.894 |
+-------------------+----------+
| F                 | 481.997  |
+-------------------+----------+
| RMSE              | 0.64     |
+-------------------+----------+ 

vincentarelbundock avatar Aug 29 '25 13:08 vincentarelbundock

I re-submitted the patch without exporting the function, just the fix, but I'm not sure at this point what actual bug with exported functions it would fix.

I thought it would be dvnames but in reality it has code to handle missing variable in the map.

raffaem avatar Aug 30 '25 16:08 raffaem

I’m starting to think that modelsummary should take full charge of label handling instead of delegating it to parameters.

I think this would re-invent the wheel

raffaem avatar Aug 30 '25 16:08 raffaem

I think this would re-invent the wheel

OK, fair. Let's not do that.

So what are we trying to achieve with this PR if we don't know what bug we're fixing?

Also, some checks are failing, apparently.

vincentarelbundock avatar Aug 30 '25 16:08 vincentarelbundock