Numpy_arraysetops_EP
Numpy_arraysetops_EP copied to clipboard
Mode is returning unexpected results
Hi, I really like this package and it really optimizes some of my workflow.
I have however experience unexcepted results from the npi.group_by(b).mode(a)
function.
This is your test code, which works fine (probably because the zones are in sorted format)
a = [4, 5, 6, 4, 0, 9, 8, 5, 4, 9] # Values
b = [0, 0, 0, 0, 1, 2, 2, 2, 2, 2] # Zones
k, u = npi.group_by(b).mode(a)
--> out u : array([4, 0, 9])
However as soon the the zones are not in a sorted format, the results of the mode is not correct. I am not sure if this is a requirement (sorted zones), since all the other statistical operations (mean mix max etc.) seem to work fine with unsorted zones.
a = [2,2,2,3,1,3,4,9,6,5] # Values
b = [1,1,1,2,1,2,2,3,4,5] # Zones
k, u = npi.group_by(b).mode(a)
--> out u : array([2, 1, 9, 6, 5])
where I am expecting the output to be [2,3,9,6,5]
The above example works correctly for the other statistic types. I hope this makes sense, perhaps I have a wrong understanding or interpretation. Thanks
Thanks for the input. Havnt done maintenance on this repo in ages, but I might find some time for it in the coming week. Looks indeed like a legit bug. Ill need to dig in to find the optimal solution. Indeed permuting the arrays with the argsort of b gives the correct result; so that could be one fix but im not sure its the best one. Also the returned k does not return what it says in the docstring, and what youd expect from the rest of the API. I think I threw in this functionality as a bonus without ever really being invested in its correctness; but hopefully this all can be fixed properly.
Thanks a lot, I will meanwhile try the work-around that you suggested.