posterior
posterior copied to clipboard
New helper function repair_variable_names
closes #317
Summary
repair_variable_names converts names using periods (theta.1.1) to square brackets (`theta[1,1])
Copyright and Licensing
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
c6ab463) 95.60% compared to head (0ebe8e9) 95.61%.
:exclamation: Current head 0ebe8e9 differs from pull request most recent head 00d32c6. Consider uploading reports for the commit 00d32c6 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #318 +/- ##
=======================================
Coverage 95.60% 95.61%
=======================================
Files 47 47
Lines 3686 3694 +8
=======================================
+ Hits 3524 3532 +8
Misses 162 162
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Maybe repair isn't the best name actually? @avehtari suggested convert in https://github.com/stan-dev/posterior/issues/317#issuecomment-1816897197. I'm open to other suggestions too.
I like "repair". Having the dots instead of [] is kind of a thing we don't really want. So we are repairing them to look reasonable again. But I also don't have strong opinions. So completely fine by me if we call it differently.
Re: function name: is the intention to create a function that specifically converts from Stan names, or is the intention that this function might (in the future) support other non-posterior-friendly naming conventions? I think that should influence the naming; e.g. is this "convert_from_stan_names" (though perhaps less verbose) or the more generic "repair_variable_names" or "convert_variable_names"? Or is it "repair_dotted_indices" or "repair_dotted_index_names" or something like that?
Good point @mjskay. If this will only ever be used to convert Stan names we could include that in the function name. What do others think?
I think it would be best to make the function specific for Stan naming conventions, and then add other functions if we add support for other packages. The name "convert_from_stan_names" would be then very descriptive, and I don't know how to make it shorter
Furthermore, in #317 @WardBrian wrote
Complex variables use .real and .imag as their CSV names in Stan. So something like a complex_vector[2] foo has names foo.1.real,foo.1.imag,foo.2.real,foo.2.imag
Tuples use : to separate "slots" in a tuple. So a tuple(real, int) a is printed as a:1,a:2. This stacks naturally with the . for arrays of tuples/tuples of arrays.
If we want to support these, too, I assume that we would like to have something like "foo.1.real" -> "foo_real[1]" and "a:1" -> "a_1"
And the natural complex type support is a separate issue #319 and PR #321
I agree with Jonah and Aki.
If you did want to support tuple names, this essentially just needs to split on : and loop over the resulting parts. See the code that does something similar in Stan: https://github.com/stan-dev/stan/blob/667d5b2d7c5ea1072011919a1acd76a9d3061d01/src/stan/io/stan_csv_reader.hpp#L16