dolt
dolt copied to clipboard
`IN (SUBQUERY)` expressions ignore collation
Steps to reproduce:
-- create
CREATE TABLE test (
pk INTEGER PRIMARY KEY,
name TEXT NOT NULL
) COLLATE=utf8mb4_0900_ai_ci;
-- insert
INSERT INTO test VALUES (0001, 'Clark');
INSERT INTO test VALUES (0002, 'Dave');
INSERT INTO test VALUES (0003, 'Ava');
-- fetch
SELECT 'clark' IN (SELECT name FROM test);
Expected output: true
Actual output: false
Because the table uses a case-insensitive collation, the values 'Clark' and 'clark' should compare equal, and the expression should return true.
Here's the plan for the query:
+---------------------------------+
| plan |
+---------------------------------+
| Project |
| ├─ columns: [InSubquery |
| │ ├─ left: 'clark' |
| │ └─ right: Subquery |
| │ ├─ cacheable: true |
| │ └─ Table |
| │ ├─ name: test |
| │ └─ columns: [name] |
| │ ] |
| └─ Project |
| ├─ columns: [dual.] |
| └─ Table |
| └─ name: |
+---------------------------------+
The implementation for the InSubquery expression uses a hash table to check for membership, but this requires normalizing the keys based on the table collation. Since we don't appear to do that normalization, we get different hashes for the two strings.