tidb
tidb copied to clipboard
In the GROUP BY clause, TiDB incorrectly handled ' ' and NULL
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
CREATE TABLE t0 (
c0 TEXT NOT NULL
);
INSERT INTO t0 VALUES (' '), ('dadfad'), ('2342dfad'), ('2dfad');
CREATE TABLE t1 (
c0 TEXT NOT NULL
);
INSERT INTO t1 VALUES ('xxx'), ('3gf'), (''), ('dddd');
SELECT count(t1.c0) FROM t1 LEFT OUTER JOIN t0 ON t0.c0 = t1.c0 GROUP BY t0.c0;
2. What did you expect to see? (Required)
+--------------+
| count(t1.c0) |
+--------------+
| 4 |
+--------------+
3. What did you see instead (Required)
MySQL> SELECT count(t1.c0) FROM t1 LEFT OUTER JOIN t0 ON t0.c0 = t1.c0 GROUP BY t0.c0;
+--------------+
| count(t1.c0) |
+--------------+
| 3 |
| 1 |
+--------------+
2 rows in set (0.004 sec)
4. What is your TiDB version? (Required)
MySQL> select tidb_version();
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version() |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v8.0.0
Edition: Community
Git Commit Hash: 8ba1fa452b1ccdbfb85879ea94b9254aabba2916
Git Branch: HEAD
UTC Build Time: 2024-03-28 14:22:04
GoVersion: go1.21.6
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.002 sec)
The comparison with text type column seems ignoring white spaces. Not common cases. Lower priority to major.
mysql> select * from t1 where t1.c0 = ' ';
+----+
| c0 |
+----+
| |
+----+
1 row in set (0.00 sec)
/remove-severity critical
/severity major
/assign
select * from t1 left outer join t0 on t0.c0 = t1.c0; Result in tidb:
+------+------+
| c0 | c0 |
+------+------+
| xxx | NULL |
| 3gf | NULL |
| | |
| dddd | NULL |
+------+------+
Result in mysql:
+------+------+
| c0 | c0 |
+------+------+
| xxx | NULL |
| 3gf | NULL |
| | NULL |
| dddd | NULL |
+------+------+
This issue do not need fix, cause create table with collate utf8mb4_unicode_ci in mysql, you will get the same result with tidb.
/close
@xzhangxian1008: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
We will get same results between tidb and mysql with the following statements with all collations in tidb. So the behaviour in tidb is correct. Not a bug.
drop table t0;
CREATE TABLE t0 (
c0 TEXT NOT NULL COLLATE xxx
) collate=xxx;
INSERT INTO t0 VALUES (' '), ('dadfad'), ('2342dfad'), ('2dfad');
select * from t0 where c0=' ';
We will get same results between tidb and mysql with the following statements with all collations in tidb. So the behaviour in tidb is correct. Not a bug.
drop table t0; CREATE TABLE t0 ( c0 TEXT NOT NULL COLLATE xxx ) collate=xxx; INSERT INTO t0 VALUES (' '), ('dadfad'), ('2342dfad'), ('2dfad'); select * from t0 where c0=' ';
correct. strictly, we could try test with hashjoin case like the issue states also, cause it goes different func when running