forestmodel
forestmodel copied to clipboard
Using variable labels instead of variable names when available
Variable labels, stored as a label
attributes and easily accessible with labelled::var_label()
, are becoming quite common. Many packages (like gtsummary
) producing graphs or tables are now adopting the following rule: if defined, use variable labels instead of variable names.
Such addition to forestmodel
would allow to easily customize the names of variables displayed on forest plots.
This package is not in active development, if you are interested in this feature, please implement it, then keep a fork or create a pull request to https://github.com/ShixiangWang/forestmodel
@ShixiangWang is it an official fork?
@NikNakk could you clarify if you still plan to maintain and develop forestmodel
?
@larmarange Nope, I don't say that. The author is nice, but he may be not active in GitHub, from my view.
Hi @larmarange, @ShixiangWang,
I've not been very active in maintaining this package for a while because of being busy with other things, but I'm still aiming to get to the outstanding queries that have been raised including yours. There's also now a more pressing reason to attend to the package because it's erroring on CRAN so will be delisted if I don't fix that. I'll at least fix the current issue that would lead to delisting in the next few days, but if I can I'll try to fix any other outstanding issues and improvements.
Thanks @NikNakk for your feedback.
Regarding the proposed improvement, it should not be very difficult to implement once identified where variable names are taken into account.
I didn't have time to get into your code in details so I do not know yet how your code was organized. But as you are familiar with your package, you should have an idea on where to look at.
Best regards
@larmarange I've made a new branch that has a simple implementation of this at https://github.com/NikNakk/forestmodel/tree/labels. You can test it using remotes::install_github("NikNakk/forestmodel@labels")
Thanks a lot
@larmarange please let me know when you've had a chance to test this out.
@NikNakk I have done some quick tests. It works well with simple models. Thanks.
When I add interaction terms, labels are not taken into account for interaction terms, but it was already the case before (it seems that forstmodel
was not treating them in a particular way).
library(questionr)
library(forestmodel)
library(labelled)
data(fertility)
women <- unlabelled(women)
mod <- glm(employed ~ age + residency * instruction, data = women, family = binomial())
forest_model(mod, exponentiate = TRUE)
Here a quick example with gtsummary
to show this package handle interaction terams.
library(gtsummary)
tbl_regression(mod)
Characteristic | log(OR) | 95% CI | p-value |
---|---|---|---|
Age at last anniversary (in years) | 0.06 | 0.05, 0.07 | <0.001 |
Urban / rural residency | |||
urban | |||
rural | 0.28 | 0.00, 0.55 | 0.052 |
Level of instruction | |||
none | |||
primary | 0.35 | -0.02, 0.74 | 0.067 |
secondary | -0.83 | -1.2, -0.50 | <0.001 |
higher | -0.71 | -1.3, -0.10 | 0.022 |
Urban / rural residency * Level of instruction | |||
rural * primary | -0.16 | -0.67, 0.35 | 0.5 |
rural * secondary | 0.19 | -0.41, 0.80 | 0.5 |
rural * higher | -1.5 | -4.5, 0.56 | 0.2 |
But I know that managing interaction terms could be tricky and beyond the current issue.
Otherwise, it's perfect. Thanks a lot
I’ll have a look at interaction terms when I get a chance. gtsummary
looks like a good starting point. For now I’ve merged the labels branch into master and need to get the latest version on CRAN because otherwise it will be delisted.
Thanks
FYI, this version is now on CRAN.
Variable labels still not showing up
Variable labels still not showing up
Same here, it works fine with gtsummary::tbl_regression
but not with forest_model
from the forestmodel
package that I just downloaded from Github.
Sorry for the delayed response, @proshano and @corneliushennch. Could you please give me some example code that doesn't work as expected? I'm still planning to work on interaction terms since they're not currently properly supported with or without labels.
In case it could be useful for you, gtsummary::tbl_regression()
now relies on broom.helpers
package: https://larmarange.github.io/broom.helpers/
EDIT:
The problem occurs with factor and character variables when using coxph()
. Only the label of the numeric variable gets printed as you can see in the reprex
. All variable types work fine if you use other models (just checked glm). So changing factors back to character – which would already be tedious as factors are pretty standard in this kind of data analysis – doesn't solve it, as I first thought. I'd very much appreciate if you could implement the proper use of labels also for the coxph objects, as there is so far no convenient function that can display Hazard ratios in clear forest plots with labels. I formerly used survminer::ggforest
, but switched to forestmodel
in order to be able to use labels...
library(survival)
library(dplyr)
library(forestmodel)
surv_data <- tibble(
time = abs(rnorm(300, 50, 30)),
event = sample(c(0,1), 300, prob = c(0.8, 0.2), replace = TRUE),
gender = sample(c(0,1), 300, prob = c(0.6, 0.4), replace = TRUE),
rx = sample(c("no","yes"), 300, prob = c(0.5, 0.5), replace = TRUE),
gene = sample(c(0,1), 300, prob = c(0.9, 0.1), replace = TRUE)
)
surv_data <- surv_data %>%
mutate(gender = factor(gender, levels = c(0,1), labels = c("female", "male")))
labelled::var_label(surv_data) <- list(
gender = "Gender (f/m)", #this variable is a factor -> doesn't work!
rx = "Irradiation", # character -> label doesn't work!
gene = "Gene of Interest" # numeric -> label works...
)
labelled::var_label(surv_data) # checking that labels are assigned
#> $time
#> NULL
#>
#> $event
#> NULL
#>
#> $gender
#> [1] "Gender (f/m)"
#>
#> $rx
#> [1] "Irradiation"
#>
#> $gene
#> [1] "Gene of Interest"
lapply(surv_data, class) # showing variable classes
#> $time
#> [1] "numeric"
#>
#> $event
#> [1] "numeric"
#>
#> $gender
#> [1] "factor"
#>
#> $rx
#> [1] "character"
#>
#> $gene
#> [1] "numeric"
# printing the coxph model -> only label of numeric variable works
print(forest_model(coxph(formula = Surv(time, event) ~ gender + rx +
gene, data = surv_data)))
# ok it seems to be a specific problem of the coxph object -> labels get printed correctly
# with glm...
mod <- glm(gender ~ gene + rx, data = surv_data, family = binomial())
forest_model(mod, exponentiate = TRUE)
Created on 2021-04-23 by the reprex package (v0.3.0)
I would also love to have the coxph factor label bug fixed as it would save a lot of time in my work.
Thanks a lot for all your efforts. Please let us know if the label bug got fixed for coxph
as it would save us a lot of time. I tried it but it didn't work. It works perfectly with glm
only. Appreciate your help and advice.