parameters Identifying non-data columns

In one of the recent bayestestR PRs (https://github.com/easystats/bayestestR/pull/673 & https://github.com/easystats/bayestestR/pull/672) some of the printing methods have adopted allowing for arbitrary columns in the resulting data frame objects. This was done by setting a new type of attribute called idvars the contained the names of the columns that don't hold the "statistical" information, but information used to identify the rows.

I've found this to better and more stable than keeping track of all possible column names, and very flexible.

I wonder if this should be used across easystats? We can have several "classes" of such attributes - idvars, grouping-vars, etc....

These could be useful if "detected" by the various formatting and printing methods in parameters (which is why I've opened this issue here, but feel free to move it elsewhere) and in insight (maybe also datawizard, correlation and modelbased?).

WDYT? @easystats/core-team

Oct 06 '24 10:10 mattansb

Do you have an example that makes clear how this affects printing or how we have to change methods?

Oct 06 '24 11:10 strengejacke

Example data frame of CI results:

results <- data.frame(Parameter = c("q", "w"), 
                      CI = c(0.95, 0.95), 
                      CI_low = c(-1.87971309451912, 0.0409341466147453),
                      CI_high = c(2.15779289064407, 0.992114187916741),
                      method = "🦆")

results
#>   Parameter   CI      CI_low   CI_high method
#> 1         q 0.95 -1.87971309 2.1577929      🦆
#> 2         w 0.95  0.04093415 0.9921142      🦆

The old code for printing was something like this - it only kept relevant columns (e.g., the "method" column will be dropped):

OLD_format_ci <- function(x, ...) {
  # Keep only columns we want to show:
  i_keep <- colnames(x) %in% c("Parameter", "CI", "CI_low", "CI_high")
  
  insight::format_table(x[,i_keep, drop = FALSE])
}

OLD_format_ci(results)
#>   Parameter        95% CI
#> 1         q [-1.88, 2.16]
#> 2         w [ 0.04, 0.99]

However, if you need more than 1 column to identiy a row, this breaks because you can't store all of this inforation nicly in a Parameter column:

results <- data.frame(Xval = c(1, 3),
                      Zlev = c("q", "w"), 
                      CI = c(0.95, 0.95), 
                      CI_low = c(-1.87971309451912, 0.0409341466147453),
                      CI_high = c(2.15779289064407, 0.992114187916741),
                      method = "🦆")

results
#>   Xval Zlev   CI      CI_low   CI_high method
#> 1    1    q 0.95 -1.87971309 2.1577929      🦆
#> 2    3    w 0.95  0.04093415 0.9921142      🦆

OLD_format_ci(results)
#>          95% CI
#> 1 [-1.88, 2.16]
#> 2 [ 0.04, 0.99]

Instead you need a more flexible method:

NEW_format_ci <- function(x, ...) {
  # Keep only columns we want to show:
  i_keep <- colnames(x) %in% c(attr(x, "idvars"), "CI", "CI_low", "CI_high")
  
  insight::format_table(x[,i_keep, drop = FALSE])
}

# Set the idvars attribute:
attr(results, "idvars") <- c("Xval", "Zlev")

NEW_format_ci(results)
#>   Xval Zlev        95% CI
#> 1 1.00    q [-1.88, 2.16]
#> 2 3.00    w [ 0.04, 0.99]

We can extend this to also include columns that "group" rows:

NEW_print_ci_html <- function(x, ...) {
  # Keep only columns we want to show:
  i_keep <- colnames(x) %in% c(attr(x, "idvars"), "CI", "CI_low", "CI_high")
  
  x_fmt <- insight::format_table(x[,i_keep, drop = FALSE])
  
  insight::print_html(x_fmt, by = attr(x, "groupvars"))
}

results_grouped <- cbind(A = rep(c("a1", "a2"), each = 2), rbind(results, results))
results_grouped
#>    A Xval Zlev   CI      CI_low   CI_high method
#> 1 a1    1    q 0.95 -1.87971309 2.1577929      🦆
#> 2 a1    3    w 0.95  0.04093415 0.9921142      🦆
#> 3 a2    1    q 0.95 -1.87971309 2.1577929      🦆
#> 4 a2    3    w 0.95  0.04093415 0.9921142      🦆

attr(results_grouped, "idvars") <- c("A", "Xval", "Zlev")
attr(results_grouped, "groupvars") <- c("A")

NEW_print_ci_html(results_grouped)

Xval	Zlev	95% CI
a1
1.00	q	[-1.88, 2.16]
3.00	w	[ 0.04, 0.99]
a2
1.00	q	[-1.88, 2.16]
3.00	w	[ 0.04, 0.99]

^{Created on 2024-10-07 with reprex v2.1.1}

Oct 07 '24 12:10 mattansb

ok, I see. I think this is something that needs to be handled in the packages' format() methods - in insight, only the "final" data frame is processed, no filtering/column-selection is usually done there.

We should then decide on the attributes' names. If I look at your code changes, you would suggest the attribute idvars for those columns that should also be included in the output, additional to the default-columns, right?

Oct 07 '24 12:10 strengejacke

I think this is something that needs to be handled in the packages' format() methods

Yes, the way things are setup now. But perhaps this can be directly adapted into insight::format_table() or insight::export_table() at some point.

Oct 27 '24 20:10 mattansb