arcadedb icon indicating copy to clipboard operation
arcadedb copied to clipboard

SQL ILIKE will fails on string with \n inside

Open ExtReMLapin opened this issue 9 months ago • 10 comments

Hello, it's me again. TL;DR at the bottom

ArcadeDB Version:

ArcadeDB Server v24.5.1-SNAPSHOT (build 71472c4c05ae9278d0d5d71a919b31883e1ea767/1713899543605/main) is starting up... creating nodes and reading nodes

OS and JDK Version:

Running on Linux 5.15.0-100-generic - OpenJDK 64-Bit Server VM 11.0.22 for creating nodes and reading nodes

Expected behavior

ILIKE should match in any case not based on existence of \n inside the string

Actual behavior

ILIKE fails when there is any \n inside the string.

Steps to reproduce

Pick up this database :

test_db.zip

Try both :

SELECT *  FROM CHUNK"
SELECT *  FROM CHUNK where text ILIKE "%e%"

In theory, both request should output same result as all the CHUNK nodes have at least one e in the text field, however, here it fails to catch one of them.

TLDR :

All 4 have e, ILIKE should match all of them, but it doesn't.

firefox_xuuEyHHDPs firefox_GfRXRz83d5

This is the correct query to execute as a temp fix :

SELECT *  FROM CHUNK where text.replace('\n', '') ILIKE "%e%"

Thanks for the help.

ExtReMLapin avatar Apr 29 '24 14:04 ExtReMLapin

What's weird is that when created on another database, isolated, there is no issue.

image

ExtReMLapin avatar Apr 29 '24 14:04 ExtReMLapin

Hi, does the same happen with LIKE + toLowerCase(), ie:

SELECT *  FROM CHUNK where text.toLowerCase() LIKE "%e%"

gramian avatar Apr 29 '24 14:04 gramian

Yep, only 3 records, still not 4

ExtReMLapin avatar Apr 29 '24 14:04 ExtReMLapin

It also could be something than the text len, it could be a special character in the string as spamming e didn't cause this issue

ExtReMLapin avatar Apr 29 '24 14:04 ExtReMLapin

I was also thinking of special chars, could you try normalize() in the filter, ie:

SELECT *  FROM CHUNK where text.normalize() ILIKE "%e%"

gramian avatar Apr 29 '24 14:04 gramian

It doesn't help, We were already using normalization, "lowercase mode" in the original request my coworkers was working on. We on purpose moved for this specific request from Cypher to SQL to have the normalization features.

ExtReMLapin avatar Apr 29 '24 14:04 ExtReMLapin

It's the \n that breaks it.

ExtReMLapin avatar Apr 29 '24 14:04 ExtReMLapin

Sorry for the spam, but SELECT * FROM CHUNK where text.replace('\n', '') ILIKE "%e%" fixes it

ExtReMLapin avatar Apr 29 '24 15:04 ExtReMLapin

Good find. This may still be an issue or at least needs documentation.

gramian avatar Apr 29 '24 15:04 gramian

Maybe Pattern.DOTALL needs to be added here?

gramian avatar Apr 29 '24 18:04 gramian

@ExtReMLapin this should be fixed in the current repo head. Feel free to reopen if you find a problem.

gramian avatar May 08 '24 18:05 gramian