vpc icon indicating copy to clipboard operation
vpc copied to clipboard

feature: bin by strata

Open ronkeizer opened this issue 11 years ago • 4 comments

have option "bin_by_strata" to allow optimized bins per strata. for both automatic and manual binning approaches

ronkeizer avatar Oct 30 '14 22:10 ronkeizer

Yes, please!

Looking at the code, it looks like something like making the output of add_stratification into a group_df (the output of group_by) and then apply auto_bin or the manual binning to each grouping level. I'm sure that I'm over-simplifying, but hopefully this can move to implementation.

I'm happy to dive a bit deeper and help with some of the coding for it.

billdenney avatar Jun 14 '18 00:06 billdenney

Sure, happy to get this moving. Let me look again at the code in next few days to see what approach I would take. We can then compare notes and decide what would be best way forward, and who will take lead.

ronkeizer avatar Jun 14 '18 05:06 ronkeizer

I started looking at this again today after a long hiatus, and I think that the simplest way to implement auto-binning by stratum to me would be to:

  • first nest the data (dplyr::group_by_at() on the strata, then nest() on that),
  • then operate on each nested dataset as though it were not stratified (since the stratification is outside of that),
  • when operating on the nested dataset, assign the stratum to it as a numeric value (or alternatively as the upper and lower bound where the bin applies),
  • generate summary stats on the nested dataset strata, and finally
  • expand back to an un-nested dataset for plotting.

The first step for that, to me, is to make auto_bin() specific to the underlying data class that is being stratified rather than having it operate on a data.frame. I did that in #54.

billdenney avatar Mar 14 '19 21:03 billdenney

Thanks Bill. Will have a look in next few days, did receive a few more requests for this feature :)

ronkeizer avatar Mar 15 '19 13:03 ronkeizer