tikv icon indicating copy to clipboard operation
tikv copied to clipboard

`encoding failed` when processing queries with GBK/GB18030

Open CbcWestwolf opened this issue 1 year ago • 2 comments

Bug Report

What version of TiKV are you using?

master

What operating system and CPU are you using?

not system related bug

Steps to reproduce

  1. start a cluster with pd, tidb and tikv
  2. execute the following queries
drop table t1;
CREATE TABLE t1 (c VARCHAR(4) CHARACTER SET gbk);
INSERT INTO t1 VALUES (0x8BF5819AEDC3), (0x99CC), (0x90459958), (0xAA95C0E59E509AED), (0xCCE7), (0x9068), (0x90459958);
SELECT * from t1;
SELECT ANY_VALUE(HEX(c)), COUNT(c) FROM t1 GROUP BY c COLLATE gbk_chinese_ci;

What did you expect?

All statements succeed to execute

What did happened?

Encounter (1105, 'encoding failed') for last statement

CbcWestwolf avatar Oct 09 '24 03:10 CbcWestwolf

/assign

CbcWestwolf avatar Oct 09 '24 03:10 CbcWestwolf

Check the query plan:

TiDB [email protected]:test> explain SELECT ANY_VALUE(HEX(c)), COUNT(c) FROM t1 GROUP BY c COLLATE gbk_chinese_ci;
+-------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                      | estRows  | task      | access object | operator info                                                                                                                                        |
+-------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_4            | 8000.00  | root      |               | any_value(hex(to_binary(test.t1.c)))->Column#4, Column#3                                                                                             |
| └─HashAgg_9             | 8000.00  | root      |               | group by:Column#6, funcs:count(Column#7)->Column#3, funcs:firstrow(Column#8)->test.t1.c                                                              |
|   └─TableReader_10      | 8000.00  | root      |               | data:HashAgg_5                                                                                                                                       |
|     └─HashAgg_5         | 8000.00  | cop[tikv] |               | group by:cast(test.t1.c, varchar(4) CHARACTER SET gbk COLLATE gbk_chinese_ci), funcs:count(test.t1.c)->Column#7, funcs:firstrow(test.t1.c)->Column#8 |
|       └─TableFullScan_8 | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                                                                                       |
+-------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set
Time: 0.013s

The error happens at https://github.com/tikv/tikv/blob/6f389be835d5b8c4362c4989dca925af077e63f7/components/tidb_query_datatype/src/codec/collation/mod.rs#L216

The cause is that, the length of inner is truncated to 4 bytes somewhere, which is not expected. 4 in the varchar(4) means '4 chars', not '4 bytes'

CbcWestwolf avatar Oct 09 '24 03:10 CbcWestwolf

Reproduced in 6.5.11, 7.1.6 and 7.5.4 Not repro in 5.4 and 6.1, since the hash agg is not pushed down to tikv

TiDB [email protected]:test> explain SELECT ANY_VALUE(HEX(c)), COUNT(c) FROM
                       -> t1 GROUP BY c COLLATE gbk_chinese_ci;
+--------------------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------+
| id                       | estRows  | task      | access object | operator info                                                                                         |
+--------------------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------+
| Projection_4             | 8000.00  | root      |               | any_value(hex(to_binary(test.t1.c)))->Column#4, Column#3                                              |
| └─HashAgg_7              | 8000.00  | root      |               | group by:Column#11, funcs:count(Column#9)->Column#3, funcs:firstrow(Column#10)->test.t1.c             |
|   └─Projection_13        | 10000.00 | root      |               | test.t1.c, test.t1.c, cast(test.t1.c, varchar(4) CHARACTER SET gbk COLLATE gbk_chinese_ci)->Column#11 |
|     └─TableReader_12     | 10000.00 | root      |               | data:TableFullScan_11                                                                                 |
|       └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                                        |
+--------------------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------+

CbcWestwolf avatar Nov 25 '24 09:11 CbcWestwolf