docs icon indicating copy to clipboard operation
docs copied to clipboard

Fulltext case-sensitive index behavior

Open gar1t opened this issue 1 year ago • 1 comments

In Quick Start, the query:

SELECT 
  ts,
  api_path,
  log
FROM
  app_logs
WHERE
  matches(log, 'timeout');

shows results that are case-sensitive:

+---------------------+------------------+--------------------+
| ts                  | api_path         | log                |
+---------------------+------------------+--------------------+
| 2024-07-11 20:00:10 | /api/v1/billings | Connection timeout |
| 2024-07-11 20:00:10 | /api/v1/resource | Connection timeout |
+---------------------+------------------+--------------------+
2 rows in set (0.01 sec)

However, the table def is this:

Create Table: CREATE TABLE IF NOT EXISTS `app_logs` (
...
`log` STRING NULL FULLTEXT WITH(analyzer = 'English', case_sensitive = 'false'),
...)

The docs for CREATE indicate that case_sensitive for FULLTEXT is true. Based on what I'm seeing, following Quick Start, the default is false.

In any event, the query behavior is case sensitive.

Issues as I see them:

  • Possible error in either docs or implementation for default value of case_sensitive for fulltext index
  • Case-sensitive match behavior when schema shows case_sensitive to be false

gar1t avatar Aug 21 '24 14:08 gar1t

Thank you for your thorough review; the issue does indeed exist.

The specific reason is that the calculation for matches is separate between frontend and datanode. Datanode does respect the case-sensitive configuration, but this part has not yet been completed in frontend (see TODO): https://github.com/GreptimeTeam/greptimedb/blob/9c1704d4cbbfab8af07a77da598a1cfe2a5e7b22/src/common/function/src/scalars/matches.rs#L75-L95. As it stands, the implementation is currently case-sensitive.

Therefore, until this part of the work is completed, to maintain consistency, I think we can either hardcode this configuration to true and make it unchangeable, or hardcode it to false, but then change https://github.com/GreptimeTeam/greptimedb/blob/9c1704d4cbbfab8af07a77da598a1cfe2a5e7b22/src/common/function/src/scalars/matches.rs#L205 to use ilike, which would be more practical.

In any case, it was indeed an oversight, and I will arrange for a prompt fix.

cc @waynexia

zhongzc avatar Aug 21 '24 15:08 zhongzc

The fulltext query statements have been updated, see https://docs.greptime.com/user-guide/logs/query-logs/#query-statements

nicecui avatar Sep 03 '25 07:09 nicecui