arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C++] Grouper produce a wrong num_groups when set are_cols_in_encoding_order to false

Open ZhangHuiGui opened this issue 10 months ago • 0 comments

Describe the bug, including details regarding any error messages, version, and platform.

Bug found from blew comment: https://github.com/apache/arrow/pull/41036#discussion_r1564538418

It's bug in grouper when set are_cols_in_encoding_order=false in below codes: https://github.com/apache/arrow/blob/2979d69a05cb16012da06baaa801a1849e9110ce/cpp/src/arrow/compute/row/grouper.cc#L582

It will cause the num_group different with are_cols_in_encoding_order=true condition.

The encoder will sort columns by default, when we only set this compare args to false, the CompareColumnsToRows's input impl_ptr->encoder_.batch_all_cols(), impl_ptr->rows_, are all sorted, but use the incorrect column_offset to access compared column: https://github.com/apache/arrow/blob/2979d69a05cb16012da06baaa801a1849e9110ce/cpp/src/arrow/compute/row/compare_internal.cc#L366-L369

Component(s)

C++

ZhangHuiGui avatar Apr 16 '24 14:04 ZhangHuiGui