greptimedb query performance degraded when using "where <int> in (<string>)"

What type of bug is this?

Performance issue

What subsystems are affected?

Standalone mode

Minimal reproduce step

Seen in some private deployment, the query time can be drastically different: if select * from foo where id in ('1') and ts > '2024-03-09 00:00:00' and ts < '2024-03-09 16:00:00', it costs 2.7 seconds; if select * from foo where id in (1) and ts > '2024-03-09 00:00:00' and ts < '2024-03-09 16:00:00', it costs 58 millis. Something unexpected must happen, we should digging deeper.

What did you expect to see?

Same query time, or return some error saying int can't "in" string.

What did you see instead?

Very different time costs.

What operating system did you use?

macos

What version of GreptimeDB did you use?

main

Relevant log output and stack trace

No response

Mar 14 '24 11:03 MichaelScofield

I test similar cases in MySQL. It truncates it by default and yields a warning, which is a little confusing. I think we could convert the string literal in the planner phase and return an error if the string can't be converted to the target type.

mysql> select * from test_check where a > 'abc';
+---+
| a |
+---+
| 1 |
+---+
1 row in set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+-----------------------------------------+
| Level   | Code | Message                                 |
+---------+------+-----------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: 'abc' |
+---------+------+-----------------------------------------+
1 row in set (0.00 sec)

mysql> explain analyze select * from test_check where a > 'abc';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                             |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (test_check.a > 0)  (cost=0.35 rows=1) (actual time=0.0323..0.0366 rows=1 loops=1)
    -> Covering index scan on test_check using PRIMARY  (cost=0.35 rows=1) (actual time=0.03..0.0341 rows=1 loops=1)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Mar 15 '24 04:03 evenyag

PostgreSQL returns an error if it can't convert the string literal to an integer. We can follow this behavior.

postgres=# select * from test_int where a > 'abc';
ERROR:  invalid input syntax for type integer: "abc"
LINE 1: select * from test_int where a > 'abc';
                                         ^
postgres=# select * from test_int where a > '1';
 a
---
 3
 5
(2 rows)

postgres=# select * from test_int where a > '1ab';
ERROR:  invalid input syntax for type integer: "1ab"
LINE 1: select * from test_int where a > '1ab';
                                         ^
postgres=# explain select * from test_int where a > '1';
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Index Only Scan using test_int_pkey on test_int  (cost=0.15..27.13 rows=850 width=4)
   Index Cond: (a > 1)
(2 rows)

postgres=# explain select * from test_int where a > '1ab';
ERROR:  invalid input syntax for type integer: "1ab"
LINE 1: explain select * from test_int where a > '1ab';

Mar 15 '24 04:03 evenyag

@waynexia @evenyag

Are you working on this issue? If not, I'm interested in this issue.

PostgreSQL returns an error if it can't convert the string literal to an integer. We can follow this behavior.

My plan is to add the logic for checking whether the string literal can be converted to the column type in TypeConverter::convert_type.

https://github.com/GreptimeTeam/greptimedb/blob/69d9a2845f6328ca2c2d4396c11a79243a956506/src/query/src/optimizer/type_conversion.rs#L168-L203

Jan 04 '25 13:01 NiwakaDev

@NiwakaDev No, I'm not working on this. Free free to have a try!

Jan 06 '25 04:01 evenyag