deephaven-core icon indicating copy to clipboard operation
deephaven-core copied to clipboard

WritableChunk sort documentation and implementation clarification

Open devinrsmith opened this issue 1 year ago • 5 comments

As noted in #5820, WritableDoubleChunk#sort uses Arrays#sort(double[], ...) which does not sort NULL_DOUBLEs first; -Infinity is sorted earlier.

devinrsmith avatar Jul 22 '24 17:07 devinrsmith

It also appears that WritableCharChunk#sort is incorrect when offset != 0

devinrsmith avatar Jul 22 '24 17:07 devinrsmith

This may not be a bug, as it is "consistent" with the javadocs:

    /**
     * Sort this chunk in-place using Java's primitive defined ordering.
     * <p>
     * Of note is that nulls or NaNs are not sorted according to Deephaven ordering rules.
     */
    void sort();

In that case though, WritableCharChunk is incorrect because it does sort nulls first.

devinrsmith avatar Jul 23 '24 15:07 devinrsmith

Of note is that nulls or NaNs are not sorted according to Deephaven ordering rules.

Afaict, Arrays.sort does sort NaNs in the same way DH does - NaNs come last.

devinrsmith avatar Jul 23 '24 15:07 devinrsmith

Removing the SortFixup region from WritableCharChunk does not cause any tests to fail (normal, nor nightly). This surprises me very much; either, code downstream of WritableCharChunk.sort()

  1. does their own fixup (not assuming that WritableCharChunk does it for them)
  2. doesn't care about specifics of char sort (only cares about like values grouped together)

or, our testing in this regard is lacking (and we do have operations that depend on the current implementation of WritableCharChunk.sort).

devinrsmith avatar Jul 23 '24 17:07 devinrsmith

https://iceberg.apache.org/spec/#sorting is relevant info on how iceberg handles double. We may want to create an issue more generally about floating point (in)consistencies in DH.

devinrsmith avatar Nov 07 '24 17:11 devinrsmith