posterior icon indicating copy to clipboard operation
posterior copied to clipboard

modify summarise_draws internals not to coerce values to same type

Open n-kall opened this issue 1 year ago • 14 comments

Summary

This would fix #355.

As it is affecting summarise_draws, which is one of the main functions in posterior, I think this change needs to be considered carefully. It might be less efficient than currently implemented, so I am opening this PR now to check how the benchmarks for summarise_draws perform.

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

  • Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
  • Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

n-kall avatar Mar 15 '24 11:03 n-kall

This is how benchmark results would change (along with a 95% confidence interval in relative change) if eca0fae786e0769202281c83044ba6cbb3372662 is merged into master:

  •   :ballot_box_with_check:as_draws_array: 136ms -> 136ms [-0.51%, +0.83%]
  •   :rocket:as_draws_df: 64.6ms -> 61ms [-6.83%, -4.58%]
  •   :ballot_box_with_check:as_draws_list: 157ms -> 156ms [-1.13%, +0.17%]
  •   :ballot_box_with_check:as_draws_matrix: 58.8ms -> 58.7ms [-0.97%, +0.57%]
  •   :ballot_box_with_check:as_draws_rvars: 80ms -> 80.4ms [-0.49%, +1.55%]
  • :exclamation::snail:summarise_draws_100_variables: 706ms -> 715ms [+0.92%, +1.51%]
  • :exclamation::snail:summarise_draws_10_variables: 105ms -> 107ms [+0.93%, +2.1%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] avatar Mar 15 '24 11:03 github-actions[bot]

Looks like a >1% increase to summarise_draws times, which I don't think is acceptable

n-kall avatar Mar 15 '24 12:03 n-kall

How much increase exactly? 1% would barely matter I think

paul-buerkner avatar Mar 15 '24 12:03 paul-buerkner

Codecov Report

Attention: Patch coverage is 93.75000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 95.33%. Comparing base (9de9857) to head (a28483f).

:exclamation: Current head a28483f differs from pull request most recent head ded8b04. Consider uploading reports for the commit ded8b04 to get more accurate results

Files Patch % Lines
R/misc.R 90.90% 1 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #356   +/-   ##
=======================================
  Coverage   95.33%   95.33%           
=======================================
  Files          50       50           
  Lines        3855     3860    +5     
=======================================
+ Hits         3675     3680    +5     
  Misses        180      180           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Mar 15 '24 12:03 codecov-commenter

  • ❗🐌summarise_draws_100_variables: 706ms -> 715ms [+0.92%, +1.51%]

  • ❗🐌summarise_draws_10_variables: 105ms -> 107ms [+0.93%, +2.1%]

Based on these ^

n-kall avatar Mar 15 '24 12:03 n-kall

This is how benchmark results would change (along with a 95% confidence interval in relative change) if a28483f84ffd276cba402929ff3a5e976a12046a is merged into master:

  •   :ballot_box_with_check:as_draws_array: 137ms -> 138ms [-0.57%, +1.33%]
  • :exclamation::snail:as_draws_df: 64.7ms -> 66.8ms [+1.45%, +5.05%]
  •   :ballot_box_with_check:as_draws_list: 163ms -> 164ms [-0.39%, +2%]
  • :exclamation::snail:as_draws_matrix: 60.5ms -> 63.6ms [+3.63%, +6.84%]
  •   :ballot_box_with_check:as_draws_rvars: 81.6ms -> 81.1ms [-1.61%, +0.46%]
  • :exclamation::snail:summarise_draws_100_variables: 709ms -> 719ms [+0.98%, +1.77%]
  • :exclamation::snail:summarise_draws_10_variables: 108ms -> 110ms [+0.75%, +2.83%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] avatar Mar 15 '24 12:03 github-actions[bot]

I've now added tests checking that the column types are as expected, and if the speed is ok, this is ready for review

n-kall avatar Mar 15 '24 12:03 n-kall

FWIW, I think the unnesting / flattening might be doable as a cbind on row vectors (or maybe vec_cbind / vec_rbind for safety) to create a 1-row data frame. Dunno if that helps at all.

mjskay avatar Mar 15 '24 12:03 mjskay

This is how benchmark results would change (along with a 95% confidence interval in relative change) if e175ba88de29e2cbaa1ce74dad77763411003118 is merged into master:

  •   :ballot_box_with_check:as_draws_array: 156ms -> 155ms [-1.35%, +0.57%]
  •   :rocket:as_draws_df: 71.3ms -> 68.4ms [-5.59%, -2.54%]
  •   :ballot_box_with_check:as_draws_list: 176ms -> 175ms [-1.7%, +0.47%]
  •   :ballot_box_with_check:as_draws_matrix: 71ms -> 70.5ms [-1.68%, +0.53%]
  •   :ballot_box_with_check:as_draws_rvars: 84ms -> 84.3ms [-0.89%, +1.65%]
  • :exclamation::snail:summarise_draws_100_variables: 724ms -> 733ms [+0.35%, +1.98%]
  • :exclamation::snail:summarise_draws_10_variables: 112ms -> 114ms [+1.05%, +2.91%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] avatar Mar 15 '24 12:03 github-actions[bot]

Speed is okay. Still, is this already the best way to implement the flattening?

paul-buerkner avatar Mar 15 '24 13:03 paul-buerkner

unnesting / flattening might be doable as a cbind on row vectors (or maybe vec_cbind / vec_rbind for safety) to create a 1-row data frame.

Thanks for the tip! Simplifying this unnesting/flattening would, I think, be beneficial. However, I can't quite get the right structure with rbind or vec_rbind as the quantile2 column ends up with vectors of length 2 (as opposed to separate columns as in unlist). Looks like it would still need some kind of column splitting (e.g. tidyr::unnest_wider)

create_summary_list(example_draws(), 3, list(rhat = rhat_basic, quantile2 = quantile2), NULL) |>
   vec_rbind()

##       rhat            quantile2
## 1 1.014967 -1.231353, 18.874003

create_summary_list(example_draws(), 3, list(rhat = rhat_basic, quantile2 = quantile2), NULL) |>
   unlist()

##         rhat  quantile2.q5 quantile2.q95
##     1.014967     -1.231353     18.874003

n-kall avatar Mar 15 '24 13:03 n-kall

Ah right... you might be able to use cbind if you conditionally transpose (or rbind) vectors of length > 1 to turn them into row vectors (not on a computer atm so I can't try it)

mjskay avatar Mar 15 '24 13:03 mjskay

is this already the best way to implement the flattening

I don't think the issue is so critical that the fix is needed right away, so perhaps it's better to leave this open in case some other ideas come up (e.g. if @mjskay's method works out). However, I won't spend more time on this now

n-kall avatar Mar 15 '24 13:03 n-kall