tidb icon indicating copy to clipboard operation
tidb copied to clipboard

In the GROUP BY clause, TiDB incorrectly handled ' ' and NULL

Open sjyango opened this issue 10 months ago • 4 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

CREATE TABLE t0 (
c0 TEXT NOT NULL
);
INSERT INTO t0 VALUES (' '), ('dadfad'), ('2342dfad'), ('2dfad');

CREATE TABLE t1 (
c0 TEXT NOT NULL
);
INSERT INTO t1 VALUES ('xxx'), ('3gf'), (''), ('dddd');

SELECT count(t1.c0) FROM t1 LEFT OUTER JOIN t0 ON t0.c0 = t1.c0 GROUP BY t0.c0;

2. What did you expect to see? (Required)

+--------------+
| count(t1.c0) |
+--------------+
|            4 |
+--------------+

3. What did you see instead (Required)

MySQL> SELECT count(t1.c0) FROM t1 LEFT OUTER JOIN t0 ON t0.c0 = t1.c0 GROUP BY t0.c0;
+--------------+
| count(t1.c0) |
+--------------+
|            3 |
|            1 |
+--------------+
2 rows in set (0.004 sec)

4. What is your TiDB version? (Required)

MySQL> select tidb_version();
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                               |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v8.0.0
Edition: Community
Git Commit Hash: 8ba1fa452b1ccdbfb85879ea94b9254aabba2916
Git Branch: HEAD
UTC Build Time: 2024-03-28 14:22:04
GoVersion: go1.21.6
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.002 sec)

sjyango avatar Apr 27 '24 06:04 sjyango

The comparison with text type column seems ignoring white spaces. Not common cases. Lower priority to major.

mysql> select * from t1 where t1.c0 = '                 ';
+----+
| c0 |
+----+
|    |
+----+
1 row in set (0.00 sec)

yibin87 avatar Apr 29 '24 06:04 yibin87

/remove-severity critical

yibin87 avatar Apr 29 '24 06:04 yibin87

/severity major

yibin87 avatar Apr 29 '24 06:04 yibin87

/assign

fanrenhoo avatar Apr 29 '24 13:04 fanrenhoo

select * from t1 left outer join t0 on t0.c0 = t1.c0; Result in tidb:

+------+------+
| c0   | c0   |
+------+------+
| xxx  | NULL |
| 3gf  | NULL |
|      |      |
| dddd | NULL |
+------+------+

Result in mysql:

+------+------+
| c0   | c0   |
+------+------+
| xxx  | NULL |
| 3gf  | NULL |
|      | NULL |
| dddd | NULL |
+------+------+

xzhangxian1008 avatar May 08 '24 02:05 xzhangxian1008

This issue do not need fix, cause create table with collate utf8mb4_unicode_ci in mysql, you will get the same result with tidb.

fanrenhoo avatar May 08 '24 13:05 fanrenhoo

/close

xzhangxian1008 avatar May 09 '24 03:05 xzhangxian1008

@xzhangxian1008: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ti-chi-bot[bot] avatar May 09 '24 03:05 ti-chi-bot[bot]

We will get same results between tidb and mysql with the following statements with all collations in tidb. So the behaviour in tidb is correct. Not a bug.

drop table t0;
CREATE TABLE t0 (
c0 TEXT NOT NULL COLLATE xxx
) collate=xxx;
INSERT INTO t0 VALUES (' '), ('dadfad'), ('2342dfad'), ('2dfad');
select * from t0 where c0='   ';

xzhangxian1008 avatar May 09 '24 03:05 xzhangxian1008

We will get same results between tidb and mysql with the following statements with all collations in tidb. So the behaviour in tidb is correct. Not a bug.

drop table t0;
CREATE TABLE t0 (
c0 TEXT NOT NULL COLLATE xxx
) collate=xxx;
INSERT INTO t0 VALUES (' '), ('dadfad'), ('2342dfad'), ('2dfad');
select * from t0 where c0='   ';

correct. strictly, we could try test with hashjoin case like the issue states also, cause it goes different func when running

fanrenhoo avatar May 09 '24 03:05 fanrenhoo