posterior icon indicating copy to clipboard operation
posterior copied to clipboard

.args in summarise draws incompatible with summary functions without ellipses

Open n-kall opened this issue 1 year ago • 10 comments
trafficstars

The .args argument in summarise draws is passed to all summary functions as additional arguments. This fails if the summary function does not handle ... or the keyword arguments specified in .args. Common functions that don't handle ... include stats::sd and stats::var.

I'd like to be able to do something like the following:

x <- example_draws()
summarise_draws(x, mean, sd, quantile2, mcse_quantile) #q5 and q95 by default
#> [Output as expected]

# change the quantiles of interest to q1 and q99
additional_args <- list(probs = c(0.01, 0.99))
summarise_draws(x, mean, sd, quantile2, mcse_quantile, .args = additional_args)

#> Error in stats::sd(x, ...) : unused argument (probs = c(0.01, 0.99))

One option would be to make aliases of these common summary functions that handle ..., but perhaps there is a more sensible way to only pass the keywords to functions that handle them?

n-kall avatar Jan 29 '24 10:01 n-kall

I think since those args don't apply to all the functions anyway I'd be inclined to wrap the relevant functions in a lambda, like define p = c(.01,. 99) then pass \(...) quantile2(..., probs = p). Feels more precise than using a blanket .args.

If we want to get fancy we could probably inspect the formals of functions to find functions that don't take dots and then not pass .args down. Not sure if there would be unintended side effects...

mjskay avatar Jan 29 '24 16:01 mjskay

Good points @mjskay, thanks! Another idea I just had is to extend .args to be a named nested list, by which one could supply additional arguments to specific summary functions, rather than having them be passed to all. I think this would greatly improve the utility of the .args argument, but would either be a breaking change or we would somehow need to detect whether .args should be passed to all functions or split and passed to only corresponding ones.

I managed to get this working with a small change to create_summary_list and it would work like this:

summarise_draws(
  example_draws(),
  quantile2,
  median,
  mcse_quantile,
  .args = list(
    quantile2 = list(probs = 0.1), # args for quantile2 function
    mcse_quantile = list(probs = c(0.05, 0.7)) # args for mcse_quantile function
  )
)

## # A tibble: 10 × 5
##    variable     q10 median mcse_q5 mcse_q70
##    <chr>      <dbl>  <dbl>   <dbl>    <dbl>
##  1 mu       -0.116    4.16   0.551    0.196
##  2 tau       0.575    3.07   0.114    0.426
##  3 theta[1]  0.0380   5.97   0.820    0.400
##  4 theta[2] -0.255    5.13   0.676    0.392
##  5 theta[3] -3.94     3.99   2.18     0.349
##  6 theta[4] -1.10     4.99   0.956    0.187
##  7 theta[5] -2.25     3.72   1.62     0.245
##  8 theta[6] -2.15     4.14   1.16     0.349
##  9 theta[7]  0.558    5.90   0.458    0.362
## 10 theta[8] -1.74     4.64   0.997    0.368

n-kall avatar Jan 30 '24 08:01 n-kall

Personally I think anonymous functions make the code clearer as they move the args closer to the function they apply to:

summarise_draws(
    example_draws(),
    \(x) quantile2(x, probs = 0.1),
    median,
    \(x) mcse_quantile(x, probs = c(0.05, 0.7))
)
#> # A tibble: 10 × 5
#>    variable     q10 median mcse_q5 mcse_q70
#>    <chr>      <dbl>  <dbl>   <dbl>    <dbl>
#>  1 mu       -0.116    4.16   0.551    0.196
#>  2 tau       0.575    3.07   0.114    0.426
#>  3 theta[1]  0.0380   5.97   0.820    0.400
#>  4 theta[2] -0.255    5.13   0.676    0.392
#>  5 theta[3] -3.94     3.99   2.18     0.349
#>  6 theta[4] -1.10     4.99   0.956    0.187
#>  7 theta[5] -2.25     3.72   1.62     0.245
#>  8 theta[6] -2.15     4.14   1.16     0.349
#>  9 theta[7]  0.558    5.90   0.458    0.362
#> 10 theta[8] -1.74     4.64   0.997    0.368

But I'll admit I have a bit of a bias against the .args argument as it feels like a bit of a hack in the first place. But maybe I'm missing a use case where .args makes things easier?

mjskay avatar Jan 30 '24 08:01 mjskay

I don't like .args either to be honest. if we can get rid of it, I would be happy I think.

Matthew Kay @.***> schrieb am Di., 30. Jan. 2024, 09:45:

Personally I think anonymous functions make the code clearer as they move the args closer to the function they apply to:

summarise_draws( example_draws(), (x) quantile2(x, probs = 0.1), median, (x) mcse_quantile(x, probs = c(0.05, 0.7)) )#> # A tibble: 10 × 5#> variable q10 median mcse_q5 mcse_q70#> #> 1 mu -0.116 4.16 0.551 0.196#> 2 tau 0.575 3.07 0.114 0.426#> 3 theta[1] 0.0380 5.97 0.820 0.400#> 4 theta[2] -0.255 5.13 0.676 0.392#> 5 theta[3] -3.94 3.99 2.18 0.349#> 6 theta[4] -1.10 4.99 0.956 0.187#> 7 theta[5] -2.25 3.72 1.62 0.245#> 8 theta[6] -2.15 4.14 1.16 0.349#> 9 theta[7] 0.558 5.90 0.458 0.362#> 10 theta[8] -1.74 4.64 0.997 0.368

But I'll admit I have a bit of a bias against the .args argument as it feels like a bit of a hack in the first place. But maybe I'm missing a use case where .args makes things easier?

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/posterior/issues/341#issuecomment-1916338955, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AEHBOFZMEKHVKTHZHTYRCXKBAVCNFSM6AAAAABCPGS7RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJWGMZTQOJVGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

paul-buerkner avatar Jan 30 '24 08:01 paul-buerkner

I agree, the anonymous functions end up with much neater looking code. My current use-case is not using summarise_draws directly, but inside priorsense, and I currently only support summary functions given as strings, as they are actually first converted to *_weighted versions of the functions. (This is all a bit messy and perhaps will be fixed by #331 and using the corresponding weight-aware rvar methods)

If I remember correctly, .args was added in the first place to handle passing weights to all summary functions, so perhaps when the summary functions directly support weights, we won't really need it

n-kall avatar Jan 30 '24 08:01 n-kall

is there anything we need to add to make .args obsolete? if everything works already now using anonymous functions, I suggest we go ahead and deprecate .args with the intention of removing it soonish. would anybody have time to make the necessary changes?

n-kall @.***> schrieb am Di., 30. Jan. 2024, 09:59:

I agree, the anonymous functions end up with much neater looking code. My current use-case is not using summarise_draws directly, but inside priorsense https://github.com/n-kall/priorsense, and I currently only support summary functions given as strings, as they are actually first converted to *_weighted versions of the functions. (This is all a bit messy and perhaps will be fixed by #331 https://github.com/stan-dev/posterior/pull/331 and using the corresponding weight-aware rvar methods)

If I remember correctly, .args was added in the first place to handle passing weights to all summary functions, so perhaps when the summary functions directly support weights, we won't really need it

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/posterior/issues/341#issuecomment-1916361893, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AFWLIU3VBY5IXD3B3DYRCY6LAVCNFSM6AAAAABCPGS7RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJWGM3DCOBZGM . You are receiving this because you commented.Message ID: @.***>

paul-buerkner avatar Jan 30 '24 09:01 paul-buerkner

I'm fine with deprecation of .args but I would prefer it not to be removed until the posterior-provided summary functions support using the weights that are in the draws (or rvar) objects. For now summarise_draws(x, mean_weighted, sd_weighted, .args = list(weights = weights(x))) seems simpler (to me) than using anonymous functions for all summary functions to add the weights.

n-kall avatar Jan 30 '24 11:01 n-kall

Yes, agreed. We still need a not so ugly solution to passing weights.

paul-buerkner avatar Jan 30 '24 13:01 paul-buerkner

I'm trying to remove any usage of .args in my code. Currently, I have been using it to pass weights (as discussed above), but also to pass the probs vector for quantile2 and mcse_quantile.

I have something like this (actual function does other things, but this is the relevant structure):

my_summary <- function(x, quantity, quantity_args) {
   posterior::summarise_draws(x, quantity, .args = quantity_args)
}

Using this code makes it possible for the user to specify quantiles in a vector, and have it apply to both mcse_quantile and quantile2, for example:

my_summary(x, quantity = c("quantile2", "mcse_quantile"), quantity_args = list(probs = c(0.2, 0.8))

@mjskay Do you know of a way to do something similar without .args, without the user needing to write each anonymous function directly? Perhaps there is a way to programmatically create the anonymous functions that I am not aware of. i.e. result in

posterior::summarise_draws(x, 
    \(x) quantile2(.x, probs = c(0.2, 0.8)),
    \(x) mcse_quantile(x, probs = c(0.2, 0.8))
)

n-kall avatar Apr 30 '24 06:04 n-kall

Because summarise_draws() can take a named list of functions, you can construct partially-applied versions of your desired functions and pass them as lists. This allows you to apply the arguments to just the functions they apply to; e.g. something like this:

fs = lapply(c(q = quantile2, mcse = mcse_quantile), \(f) \(...) f(..., probs = c(0.2, 0.8)))
summarise_draws(x, fs)

Or using a library that provide partial function application, like {purrr}, you could do:

fs = lapply(c(q = quantile2, mcse = mcse_quantile), purrr::partial, probs = c(0.2, 0.8))
summarise_draws(x, fs)

Then if you need different arguments to go to different functions, you could pass multiple function lists.

mjskay avatar Apr 30 '24 18:04 mjskay