Numpy_arraysetops_EP icon indicating copy to clipboard operation
Numpy_arraysetops_EP copied to clipboard

Mode is returning unexpected results

Open JaschaMuller opened this issue 4 years ago • 2 comments

Hi, I really like this package and it really optimizes some of my workflow. I have however experience unexcepted results from the npi.group_by(b).mode(a) function.

This is your test code, which works fine (probably because the zones are in sorted format)

a = [4, 5, 6, 4, 0, 9, 8, 5, 4, 9] # Values
b = [0, 0, 0, 0, 1, 2, 2, 2, 2, 2] # Zones
k, u = npi.group_by(b).mode(a)
--> out u : array([4, 0, 9])

However as soon the the zones are not in a sorted format, the results of the mode is not correct. I am not sure if this is a requirement (sorted zones), since all the other statistical operations (mean mix max etc.) seem to work fine with unsorted zones.

a = [2,2,2,3,1,3,4,9,6,5] # Values
b = [1,1,1,2,1,2,2,3,4,5] # Zones
k, u = npi.group_by(b).mode(a)
--> out u : array([2, 1, 9, 6, 5]) 

where I am expecting the output to be [2,3,9,6,5]

The above example works correctly for the other statistic types. I hope this makes sense, perhaps I have a wrong understanding or interpretation. Thanks

JaschaMuller avatar Dec 01 '20 10:12 JaschaMuller

Thanks for the input. Havnt done maintenance on this repo in ages, but I might find some time for it in the coming week. Looks indeed like a legit bug. Ill need to dig in to find the optimal solution. Indeed permuting the arrays with the argsort of b gives the correct result; so that could be one fix but im not sure its the best one. Also the returned k does not return what it says in the docstring, and what youd expect from the rest of the API. I think I threw in this functionality as a bonus without ever really being invested in its correctness; but hopefully this all can be fixed properly.

EelcoHoogendoorn avatar Dec 01 '20 11:12 EelcoHoogendoorn

Thanks a lot, I will meanwhile try the work-around that you suggested.

JaschaMuller avatar Dec 01 '20 12:12 JaschaMuller