Identifying non-data columns
In one of the recent bayestestR PRs (https://github.com/easystats/bayestestR/pull/673 & https://github.com/easystats/bayestestR/pull/672) some of the printing methods have adopted allowing for arbitrary columns in the resulting data frame objects. This was done by setting a new type of attribute called idvars the contained the names of the columns that don't hold the "statistical" information, but information used to identify the rows.
I've found this to better and more stable than keeping track of all possible column names, and very flexible.
I wonder if this should be used across easystats? We can have several "classes" of such attributes - idvars, grouping-vars, etc....
These could be useful if "detected" by the various formatting and printing methods in parameters (which is why I've opened this issue here, but feel free to move it elsewhere) and in insight (maybe also datawizard, correlation and modelbased?).
WDYT? @easystats/core-team
Do you have an example that makes clear how this affects printing or how we have to change methods?
Example data frame of CI results:
results <- data.frame(Parameter = c("q", "w"),
CI = c(0.95, 0.95),
CI_low = c(-1.87971309451912, 0.0409341466147453),
CI_high = c(2.15779289064407, 0.992114187916741),
method = "🦆")
results
#> Parameter CI CI_low CI_high method
#> 1 q 0.95 -1.87971309 2.1577929 🦆
#> 2 w 0.95 0.04093415 0.9921142 🦆
The old code for printing was something like this - it only kept relevant columns (e.g., the "method" column will be dropped):
OLD_format_ci <- function(x, ...) {
# Keep only columns we want to show:
i_keep <- colnames(x) %in% c("Parameter", "CI", "CI_low", "CI_high")
insight::format_table(x[,i_keep, drop = FALSE])
}
OLD_format_ci(results)
#> Parameter 95% CI
#> 1 q [-1.88, 2.16]
#> 2 w [ 0.04, 0.99]
However, if you need more than 1 column to identiy a row, this breaks because you can't store all of this inforation nicly in a Parameter column:
results <- data.frame(Xval = c(1, 3),
Zlev = c("q", "w"),
CI = c(0.95, 0.95),
CI_low = c(-1.87971309451912, 0.0409341466147453),
CI_high = c(2.15779289064407, 0.992114187916741),
method = "🦆")
results
#> Xval Zlev CI CI_low CI_high method
#> 1 1 q 0.95 -1.87971309 2.1577929 🦆
#> 2 3 w 0.95 0.04093415 0.9921142 🦆
OLD_format_ci(results)
#> 95% CI
#> 1 [-1.88, 2.16]
#> 2 [ 0.04, 0.99]
Instead you need a more flexible method:
NEW_format_ci <- function(x, ...) {
# Keep only columns we want to show:
i_keep <- colnames(x) %in% c(attr(x, "idvars"), "CI", "CI_low", "CI_high")
insight::format_table(x[,i_keep, drop = FALSE])
}
# Set the idvars attribute:
attr(results, "idvars") <- c("Xval", "Zlev")
NEW_format_ci(results)
#> Xval Zlev 95% CI
#> 1 1.00 q [-1.88, 2.16]
#> 2 3.00 w [ 0.04, 0.99]
We can extend this to also include columns that "group" rows:
NEW_print_ci_html <- function(x, ...) {
# Keep only columns we want to show:
i_keep <- colnames(x) %in% c(attr(x, "idvars"), "CI", "CI_low", "CI_high")
x_fmt <- insight::format_table(x[,i_keep, drop = FALSE])
insight::print_html(x_fmt, by = attr(x, "groupvars"))
}
results_grouped <- cbind(A = rep(c("a1", "a2"), each = 2), rbind(results, results))
results_grouped
#> A Xval Zlev CI CI_low CI_high method
#> 1 a1 1 q 0.95 -1.87971309 2.1577929 🦆
#> 2 a1 3 w 0.95 0.04093415 0.9921142 🦆
#> 3 a2 1 q 0.95 -1.87971309 2.1577929 🦆
#> 4 a2 3 w 0.95 0.04093415 0.9921142 🦆
attr(results_grouped, "idvars") <- c("A", "Xval", "Zlev")
attr(results_grouped, "groupvars") <- c("A")
NEW_print_ci_html(results_grouped)
| Xval | Zlev | 95% CI |
|---|---|---|
| a1 | ||
| 1.00 | q | [-1.88, 2.16] |
| 3.00 | w | [ 0.04, 0.99] |
| a2 | ||
| 1.00 | q | [-1.88, 2.16] |
| 3.00 | w | [ 0.04, 0.99] |
Created on 2024-10-07 with reprex v2.1.1
ok, I see. I think this is something that needs to be handled in the packages' format() methods - in insight, only the "final" data frame is processed, no filtering/column-selection is usually done there.
We should then decide on the attributes' names. If I look at your code changes, you would suggest the attribute idvars for those columns that should also be included in the output, additional to the default-columns, right?
I think this is something that needs to be handled in the packages' format() methods
Yes, the way things are setup now. But perhaps this can be directly adapted into insight::format_table() or insight::export_table() at some point.