tikv
tikv copied to clipboard
`encoding failed` when processing queries with GBK/GB18030
Bug Report
What version of TiKV are you using?
master
What operating system and CPU are you using?
not system related bug
Steps to reproduce
- start a cluster with pd, tidb and tikv
- execute the following queries
drop table t1;
CREATE TABLE t1 (c VARCHAR(4) CHARACTER SET gbk);
INSERT INTO t1 VALUES (0x8BF5819AEDC3), (0x99CC), (0x90459958), (0xAA95C0E59E509AED), (0xCCE7), (0x9068), (0x90459958);
SELECT * from t1;
SELECT ANY_VALUE(HEX(c)), COUNT(c) FROM t1 GROUP BY c COLLATE gbk_chinese_ci;
What did you expect?
All statements succeed to execute
What did happened?
Encounter (1105, 'encoding failed') for last statement
/assign
Check the query plan:
TiDB [email protected]:test> explain SELECT ANY_VALUE(HEX(c)), COUNT(c) FROM t1 GROUP BY c COLLATE gbk_chinese_ci;
+-------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_4 | 8000.00 | root | | any_value(hex(to_binary(test.t1.c)))->Column#4, Column#3 |
| └─HashAgg_9 | 8000.00 | root | | group by:Column#6, funcs:count(Column#7)->Column#3, funcs:firstrow(Column#8)->test.t1.c |
| └─TableReader_10 | 8000.00 | root | | data:HashAgg_5 |
| └─HashAgg_5 | 8000.00 | cop[tikv] | | group by:cast(test.t1.c, varchar(4) CHARACTER SET gbk COLLATE gbk_chinese_ci), funcs:count(test.t1.c)->Column#7, funcs:firstrow(test.t1.c)->Column#8 |
| └─TableFullScan_8 | 10000.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set
Time: 0.013s
The error happens at https://github.com/tikv/tikv/blob/6f389be835d5b8c4362c4989dca925af077e63f7/components/tidb_query_datatype/src/codec/collation/mod.rs#L216
The cause is that, the length of inner is truncated to 4 bytes somewhere, which is not expected. 4 in the varchar(4) means '4 chars', not '4 bytes'
Reproduced in 6.5.11, 7.1.6 and 7.5.4 Not repro in 5.4 and 6.1, since the hash agg is not pushed down to tikv
TiDB [email protected]:test> explain SELECT ANY_VALUE(HEX(c)), COUNT(c) FROM
-> t1 GROUP BY c COLLATE gbk_chinese_ci;
+--------------------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------+
| Projection_4 | 8000.00 | root | | any_value(hex(to_binary(test.t1.c)))->Column#4, Column#3 |
| └─HashAgg_7 | 8000.00 | root | | group by:Column#11, funcs:count(Column#9)->Column#3, funcs:firstrow(Column#10)->test.t1.c |
| └─Projection_13 | 10000.00 | root | | test.t1.c, test.t1.c, cast(test.t1.c, varchar(4) CHARACTER SET gbk COLLATE gbk_chinese_ci)->Column#11 |
| └─TableReader_12 | 10000.00 | root | | data:TableFullScan_11 |
| └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+--------------------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------+