feature: bin by strata
have option "bin_by_strata" to allow optimized bins per strata. for both automatic and manual binning approaches
Yes, please!
Looking at the code, it looks like something like making the output of add_stratification into a group_df (the output of group_by) and then apply auto_bin or the manual binning to each grouping level. I'm sure that I'm over-simplifying, but hopefully this can move to implementation.
I'm happy to dive a bit deeper and help with some of the coding for it.
Sure, happy to get this moving. Let me look again at the code in next few days to see what approach I would take. We can then compare notes and decide what would be best way forward, and who will take lead.
I started looking at this again today after a long hiatus, and I think that the simplest way to implement auto-binning by stratum to me would be to:
- first nest the data (
dplyr::group_by_at()on the strata, thennest()on that), - then operate on each nested dataset as though it were not stratified (since the stratification is outside of that),
- when operating on the nested dataset, assign the stratum to it as a numeric value (or alternatively as the upper and lower bound where the bin applies),
- generate summary stats on the nested dataset strata, and finally
- expand back to an un-nested dataset for plotting.
The first step for that, to me, is to make auto_bin() specific to the underlying data class that is being stratified rather than having it operate on a data.frame. I did that in #54.
Thanks Bill. Will have a look in next few days, did receive a few more requests for this feature :)