arrow
arrow copied to clipboard
[C++] Grouper produce a wrong num_groups when set are_cols_in_encoding_order to false
Describe the bug, including details regarding any error messages, version, and platform.
Bug found from blew comment: https://github.com/apache/arrow/pull/41036#discussion_r1564538418
It's bug in grouper when set are_cols_in_encoding_order=false in below codes: https://github.com/apache/arrow/blob/2979d69a05cb16012da06baaa801a1849e9110ce/cpp/src/arrow/compute/row/grouper.cc#L582
It will cause the num_group different with are_cols_in_encoding_order=true
condition.
The encoder will sort columns by default, when we only set this compare args to false, the CompareColumnsToRows
's input impl_ptr->encoder_.batch_all_cols(), impl_ptr->rows_,
are all sorted, but use the incorrect column_offset to access compared column:
https://github.com/apache/arrow/blob/2979d69a05cb16012da06baaa801a1849e9110ce/cpp/src/arrow/compute/row/compare_internal.cc#L366-L369
Component(s)
C++