gazelle_plugin
gazelle_plugin copied to clipboard
Query result not match Vanilla
SQL: select c_last_name, c_first_name, max(ss_customer_sk) ss_customer_sk_max, min(ss_customer_sk) ss_customer_sk_min, max(cast(c_customer_sk as string)) c_customer_sk_max, min(cast(c_customer_sk as string)) c_customer_sk_min, s_store_name, max(c_birth_country) c_birth_country_max, min(c_birth_country) c_birth_country_min, max(ca_country) ca_country_max,min(ca_country) ca_country_min, max(upper(ca_country)) ca_country_u_max,min(upper(ca_country)) ca_country_u_min, s_zip, ca_zip from store_sales, store_returns, store, item, customer, customer_address where ss_ticket_number = sr_ticket_number and ss_item_sk = sr_item_sk and cast(ss_customer_sk as string) = cast(c_customer_sk as string) and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and c_birth_country = upper(ca_country) and s_zip = ca_zip and s_market_id = 8 group by c_last_name, c_first_name, c_customer_sk, s_store_name, s_zip, ca_zip
using sr606 jenkins server, NativeSQL test evn.
Vanilla spark returns: 86,788. http://sr606:18080/history/application_1652807458381_0093/SQL/execution/?id=54 5/20 main branch: 6,157 http://sr606:18080/history/application_1652807458381_0099/SQL/execution/?id=3
Looks the sort agg is wrong:
Gazelle:
Vanilla:
@zhouyuan @zhixingheyi-tian
This issue is caused by https://github.com/oap-project/gazelle_plugin/blob/a44b889a506cba43fa1d16e0c4b6ed3c7149cac4/native-sql-engine/cpp/src/codegen/arrow_compute/ext/array_item_index.h#L32-L33
struct ArrayItemIndexS {
uint16_t id = 0;
uint16_t array_id = 0;
There are 287 recordbatches of row_number > 64K from CSHJ, and exceed the uint16_t range in the next ColumnarSort operator.
This issue is resolved. temporarily by the patch #941 . This patch replaced the ColumnarSort + SortAggregate with ColumnarHashAggregate. So the issue is skipped.
Will implement the batch_size control in below operators, and solve these problems thoroughly.
ColumnarBroadcastHashJoinExec
ColumnarShuffledHashJoinExec
ColumnarSortMergeJoinExec
CC @FelixYBW @zhouyuan @PHILO-HE
The same root cause as https://github.com/oap-project/gazelle_plugin/issues/906
We should add ARROW_CHECK for all cases where int16 is used as record batch size