States division
The default procedure does not return states with an equal amount of observations. The screenshot (tested in the dashboard) and the data are attached.
case1_data.csv
I actually know what is happening: NaN...
If I load the dataset, then do the decomposition and on the bins fill NaN, then I get an equal count for all scenarios.
I need to dig more to understand why we have NaNs. I don't remember the details there.
I have the feeling binned_statistic_dd is not doing exactly what I think it is๐ค I know for a SciPy maintainer... ๐
Maybe I need to calculate the bins for each axis before instead. This way I am sure that the binning is done on the number of sample and not the values. Need to check that hypothesis ๐ฎโ๐จ
NaNs in bins - what do you mean, like this?
This is the way to communicate that we want particular boundaries between states (==bins), and this case, just for the second & third input variables out of four.
If the whole thing is not supplied, (at least in the matlab package), the state boundaries are defined automatically:
- either by categories if 5 or less unique values, or
- equal amount of observations (highlighted)
Yep we can provide bounds for the bins. I just thought that was the normal behavior. I have to check that in SciPy's code and do some poking around.
So worst case I can do as you do and construct my own bounds it's not hard ๐
For the NaNs I don't remember why we have them, need to check as well.
Should be fixed in a81bf18b9d2e756eb46bbc807f9159e679803c2a
For the NaNs I don't remember why we have them, need to check as well.
Easier to discuss over a call.