arrow icon indicating copy to clipboard operation
arrow copied to clipboard

ARROW-17290: [C++] Add order-comparisons for numeric scalars

Open rtpsw opened this issue 3 years ago • 9 comments

See https://issues.apache.org/jira/browse/ARROW-17290

rtpsw avatar Aug 02 '22 21:08 rtpsw

https://issues.apache.org/jira/browse/ARROW-17290

github-actions[bot] avatar Aug 02 '22 21:08 github-actions[bot]

:warning: Ticket has not been started in JIRA, please click 'Start Progress'.

github-actions[bot] avatar Aug 02 '22 21:08 github-actions[bot]

@pitrou, could you review?

cc @icexelloss

rtpsw avatar Aug 07 '22 15:08 rtpsw

@bkietz Also curious about your opinion on this.

pitrou avatar Aug 08 '22 13:08 pitrou

@rtpsw Do we need this for the AsOfJoin improvement?

icexelloss avatar Aug 08 '22 13:08 icexelloss

I developed this PR's code while experimenting with designs for AsOfJoin; however, the design I chose to proceed with does not need this PR's code. So, I'm fine if this doesn't go through. I just figured I'd share the code to see what people think. I noticed that Arrow does not implement ordering for numeric Scalar types, but I wasn't aware of the considerations that @bkietz describes - perhaps this means the corresponding Jira, and not just this PR, is being questioned. The bit twiddling idea for ordering floats crossed my mind, but I opted to start with something simpler.

If this is intended for use in AsofJoin then we should definitely not be using Scalars at all since extracting them from a join key column, boxing them in the type-erased Scalar, then unboxing them again to do type check and comparison would be wasteful compared to doing the comparisons in-place against elements of each column.

I experimented with adding support for updating a Scalar in-place, which potentially could have avoided the above costs in my code, but ran into implementation problems (changes needed in too many places across Arrow) and decided to give up on it.

Tangent: I don't know how applicable this will be, but at least for testing purposes it might be handy to use something like Rust's float_ord to cast floating point numbers into something less subtle to sort. For example: https://gist.github.com/bkietz/8e2ef182883b886e532ffde8e537f7a3

These resources are interesting - thanks for sharing! I also found http://stereopsis.com/radix.html (linked from https://docs.rs/crate/float-ord/latest) interesting.

rtpsw avatar Aug 09 '22 20:08 rtpsw

@rtpsw Do we need this for the AsOfJoin improvement?

No.

rtpsw avatar Aug 09 '22 20:08 rtpsw

Just to clarify, I'm holding on looking into specific comments on this PR until and if we decide to move forward with it.

rtpsw avatar Aug 09 '22 20:08 rtpsw

If we don't currently or in the near future require support for comparison of Scalars in the way this PR provides, then I'd say we should close this PR and mark the JIRA wontfix.

bkietz avatar Aug 10 '22 17:08 bkietz