esutil
esutil copied to clipboard
Reverse indices are not properly sorted (and can change from version to version)
The following code snippet (a) doesn't return sorted indices and (b) results in different ordering from numpy 1.24 to numpy 1.26 on x86 processors.
import numpy as np
import esutil
arr = np.zeros(1000, dtype=np.int64)
arr[500: ] = 1
h, rev = esutil.stat.histogram(arr, rev=True)
print(rev[rev[0]: rev[1]])
print(rev[rev[1]: rev[2]])
The fix is that the histogram call to argsort should use kind="stable" which properly preserves sort order.
I don't want to fix this yet to allow my code to properly prepare for the different (but correct!) ordering.
The docs say the underlying implementation will depend on data type, so we should make sure we get good coverage in the unit tests
@erykoff are you ready for a fix for this?
I haven't done anything about this because I didn't want to deal with writing the tests. 😞
Reading the docs https://numpy.org/doc/stable/reference/generated/numpy.sort.html, it seems that the mention of datatype for stable is whether timsort is used (I think for floats, etc) or radix sort (for ints, strings, etc) is used. But the key is that it is guaranteed to maintain the input order (that is the stability). Not to say testing isn't good! But that's an implementation detail. I think it's important that this actually return stable outputs.
I'm happy to have the change when you are ready
I think this was addressed by a recent PR