esutil icon indicating copy to clipboard operation
esutil copied to clipboard

Reverse indices are not properly sorted (and can change from version to version)

Open erykoff opened this issue 1 year ago • 1 comments

The following code snippet (a) doesn't return sorted indices and (b) results in different ordering from numpy 1.24 to numpy 1.26 on x86 processors.

import numpy as np
import esutil


arr = np.zeros(1000, dtype=np.int64)
arr[500: ] = 1

h, rev = esutil.stat.histogram(arr, rev=True)

print(rev[rev[0]: rev[1]])
print(rev[rev[1]: rev[2]])

The fix is that the histogram call to argsort should use kind="stable" which properly preserves sort order.

I don't want to fix this yet to allow my code to properly prepare for the different (but correct!) ordering.

erykoff avatar Dec 20 '23 19:12 erykoff

The docs say the underlying implementation will depend on data type, so we should make sure we get good coverage in the unit tests

esheldon avatar Dec 20 '23 19:12 esheldon

@erykoff are you ready for a fix for this?

esheldon avatar May 29 '24 14:05 esheldon

I haven't done anything about this because I didn't want to deal with writing the tests. 😞

erykoff avatar May 29 '24 15:05 erykoff

Reading the docs https://numpy.org/doc/stable/reference/generated/numpy.sort.html, it seems that the mention of datatype for stable is whether timsort is used (I think for floats, etc) or radix sort (for ints, strings, etc) is used. But the key is that it is guaranteed to maintain the input order (that is the stability). Not to say testing isn't good! But that's an implementation detail. I think it's important that this actually return stable outputs.

erykoff avatar May 29 '24 15:05 erykoff

I'm happy to have the change when you are ready

esheldon avatar May 29 '24 15:05 esheldon

I think this was addressed by a recent PR

esheldon avatar Jun 18 '24 12:06 esheldon