posterior icon indicating copy to clipboard operation
posterior copied to clipboard

Roll up summaries of nonscalar variables

Open jsocolar opened this issue 4 years ago • 5 comments
trafficstars

See https://github.com/stan-dev/posterior/issues/43

This is not ready for primetime, but I want the authors to be able to commit here if they want, so here it is.

Now has:

  • A dimensions column in the rolled-up output
  • Some minimal documentation
  • a dedicated parse_variable_indices function that does the work of parsing the dimensions/indices

Still needs:

  • [ ] Proper error handling
  • [ ] Tests
  • [ ] Review of parse_variable_indices.R and, if it looks good, rewriting as_draws_rvars.draws_matrix() to rely on parse_variable_indices (see also #61)
  • [ ] Proper handling when summaries are NA for some elements (e.g. Cholesky-factors will have rhat NA for half their elements)

jsocolar avatar May 21 '21 20:05 jsocolar

Codecov Report

Merging #152 (e639cd1) into master (6cf8d24) will decrease coverage by 3.31%. The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #152      +/-   ##
==========================================
- Coverage   93.89%   90.58%   -3.32%     
==========================================
  Files          41       43       +2     
  Lines        2654     2751      +97     
==========================================
  Hits         2492     2492              
- Misses        162      259      +97     
Impacted Files Coverage Δ
R/parse_variable_indices.R 0.00% <0.00%> (ø)
R/rollup_summary.R 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6cf8d24...e639cd1. Read the comment docs.

codecov-commenter avatar May 21 '21 21:05 codecov-commenter

Having a think about how to handle NA summaries, and I'm not sure what the best approach is. Some variables will have NAs that can be safely ignored. For example, rhat for Cholesky-factors and the like. But in other cases, the NAs can be crucial diagnostic tools. For example, if there's a problem in the model definition and a parameter that is supposed to vary is fixed, or if numerical issues cause the sampler to freeze for a single element of a vector. Since there's no good way to tell whether an NA is expected or problematic, we need to decide what to do. Some options:

  • Always display NA for the maximum or minimum value if any elements are NA. This is the current behavior.
  • Provide an argument to ignore NAs.
  • Ignore NAs by default, but if the summary contains any NAs for any of the summary functions, append an additional column giving the number of elements that are NA for at least one of the summaries.
  • Something else...

jsocolar avatar May 24 '21 20:05 jsocolar

Review of parse_variable_indices.R and, if it looks good, rewriting as_draws_rvars.draws_matrix() to rely on parse_variable_indices

Thanks @jsocolar! I added (hopefully not too many :) ) comments with some suggestions for simplification. It's very exciting to see this getting done, it's gonna be a very useful function (I am probably going to rewrite some tidybayes stuff on top of it :)).

A minor point of discussion about the format: currently you have dimensionality and dimensions elements. Normally I'm all for fully spelling things out, but in this case I might suggest naming these ndim and dim respectively. That would make people's code that uses this function slightly more self-documenting, as those names are used frequently throughout R code to correspond to the two concepts these variables represent.

mjskay avatar May 25 '21 01:05 mjskay

@mjskay just mentioning that I have some time to jump back on this now. Once I remember where we're at, I'll keep it moving along.

jsocolar avatar Aug 13 '21 18:08 jsocolar

Wonderful to hear! I think that the variable index parsing aspects of this PR may also end up being related to some of the conversation in #187 about exposing internal functions for flattening / unflattening arrays. I think that work can be done after this PR though, since it's already doing a couple of different things at once.

mjskay avatar Aug 14 '21 21:08 mjskay