posterior
posterior copied to clipboard
Roll up summaries of nonscalar variables
See https://github.com/stan-dev/posterior/issues/43
This is not ready for primetime, but I want the authors to be able to commit here if they want, so here it is.
Now has:
- A dimensions column in the rolled-up output
- Some minimal documentation
- a dedicated
parse_variable_indicesfunction that does the work of parsing the dimensions/indices
Still needs:
- [ ] Proper error handling
- [ ] Tests
- [ ] Review of
parse_variable_indices.Rand, if it looks good, rewritingas_draws_rvars.draws_matrix()to rely onparse_variable_indices(see also #61) - [ ] Proper handling when summaries are NA for some elements (e.g. Cholesky-factors will have rhat NA for half their elements)
Codecov Report
Merging #152 (e639cd1) into master (6cf8d24) will decrease coverage by
3.31%. The diff coverage is0.00%.
@@ Coverage Diff @@
## master #152 +/- ##
==========================================
- Coverage 93.89% 90.58% -3.32%
==========================================
Files 41 43 +2
Lines 2654 2751 +97
==========================================
Hits 2492 2492
- Misses 162 259 +97
| Impacted Files | Coverage Δ | |
|---|---|---|
| R/parse_variable_indices.R | 0.00% <0.00%> (ø) |
|
| R/rollup_summary.R | 0.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 6cf8d24...e639cd1. Read the comment docs.
Having a think about how to handle NA summaries, and I'm not sure what the best approach is. Some variables will have NAs that can be safely ignored. For example, rhat for Cholesky-factors and the like. But in other cases, the NAs can be crucial diagnostic tools. For example, if there's a problem in the model definition and a parameter that is supposed to vary is fixed, or if numerical issues cause the sampler to freeze for a single element of a vector. Since there's no good way to tell whether an NA is expected or problematic, we need to decide what to do. Some options:
- Always display
NAfor the maximum or minimum value if any elements areNA. This is the current behavior. - Provide an argument to ignore
NAs. - Ignore
NAs by default, but if the summary contains anyNAs for any of the summary functions, append an additional column giving the number of elements that areNAfor at least one of the summaries. - Something else...
Review of parse_variable_indices.R and, if it looks good, rewriting as_draws_rvars.draws_matrix() to rely on parse_variable_indices
Thanks @jsocolar! I added (hopefully not too many :) ) comments with some suggestions for simplification. It's very exciting to see this getting done, it's gonna be a very useful function (I am probably going to rewrite some tidybayes stuff on top of it :)).
A minor point of discussion about the format: currently you have dimensionality and dimensions elements. Normally I'm all for fully spelling things out, but in this case I might suggest naming these ndim and dim respectively. That would make people's code that uses this function slightly more self-documenting, as those names are used frequently throughout R code to correspond to the two concepts these variables represent.
@mjskay just mentioning that I have some time to jump back on this now. Once I remember where we're at, I'll keep it moving along.
Wonderful to hear! I think that the variable index parsing aspects of this PR may also end up being related to some of the conversation in #187 about exposing internal functions for flattening / unflattening arrays. I think that work can be done after this PR though, since it's already doing a couple of different things at once.