simdec-python States division

The default procedure does not return states with an equal amount of observations. The screenshot (tested in the dashboard) and the data are attached. case1_data.csv

Feb 16 '24 08:02 gnopik

I actually know what is happening: NaN...

If I load the dataset, then do the decomposition and on the bins fill NaN, then I get an equal count for all scenarios.

I need to dig more to understand why we have NaNs. I don't remember the details there.

I have the feeling binned_statistic_dd is not doing exactly what I think it is🤔 I know for a SciPy maintainer... 😅

Maybe I need to calculate the bins for each axis before instead. This way I am sure that the binning is done on the number of sample and not the values. Need to check that hypothesis 😮‍💨

Mar 12 '24 16:03 tupui

NaNs in bins - what do you mean, like this? This is the way to communicate that we want particular boundaries between states (==bins), and this case, just for the second & third input variables out of four. If the whole thing is not supplied, (at least in the matlab package), the state boundaries are defined automatically:

either by categories if 5 or less unique values, or
equal amount of observations (highlighted)

Mar 13 '24 10:03 gnopik

Yep we can provide bounds for the bins. I just thought that was the normal behavior. I have to check that in SciPy's code and do some poking around.

So worst case I can do as you do and construct my own bounds it's not hard 👍

Mar 13 '24 11:03 tupui

For the NaNs I don't remember why we have them, need to check as well.

Mar 13 '24 11:03 tupui

Should be fixed in a81bf18b9d2e756eb46bbc807f9159e679803c2a

May 18 '24 14:05 tupui

For the NaNs I don't remember why we have them, need to check as well.

Easier to discuss over a call.

May 20 '24 08:05 gnopik