arkouda
arkouda copied to clipboard
Categorical.sort bug
Categorical.sort has a bug and no unit tests:
In [67]: c1 = Categorical(ak.array(["a", "a", "b"]))
...: c2 = Categorical(ak.array(["a", "b", "a"]))
In [68]: c2.sort()
Out[68]: array(['a', 'b', 'a'])
I couldn't help but dig into this a bit. It seems bill wrote this over four years ago, so it's kinda crazy it hasn't popped up until now (maybe one of the underlying functions has changed since then)
https://github.com/Bears-R-Us/arkouda/blob/bc641635c9e01d1a66abd8a7a199e1087645c622/arkouda/categorical.py#L810-L816
EDIT: The below approach doesn't work and just happened to work on the example. This is pretty obvious when you think about the respective lengths of the categories and codes
Incorrect approach
I messed around with this just a tiny bit and I don't quite understand why we need the inverse and why we can't just use the argsort to reorder the codes. I tried this out on master
def sort(self):
# __doc__ = sort.__doc__
idxperm = argsort(self.categories)
return Categorical.from_codes(self.codes[idxperm], self.categories)
and it does seem to do what we want at least for this case (with comm=none):
In [3]: s = ak.array(["a", "b", "a"])
In [4]: c = ak.Categorical(s)
In [5]: c.sort()
Out[5]: array(['a', 'a', 'b'])
I'd need to think about it more to see if there's any issues with this approach