RediSearch icon indicating copy to clipboard operation
RediSearch copied to clipboard

can't search text with '-'

Open cjdxhjj opened this issue 2 years ago • 15 comments

i have some data store in mongodb. the data example is here

` /* 1 */ { "_id" : "94cfcc6a-a5ea-4e34-a368-dc9111356aa4", "uid" : NumberLong(86590), "app" : "znt", "device_id" : "94f435c7-e36f-42de-8cc5-fb44b2fa3eb8", "create_time" : ISODate("2021-11-26T05:41:13.145Z"), "update_time" : ISODate("2022-03-07T07:50:59.572Z") }

/* 2 */ { "_id" : "2a8e1609-7631-420c-84d7-fd44c9e6342e", "uid" : NumberLong(1801978), "app" : "znt", "device_id" : "1bb4a0b7-ae38-4688-ab8b-d3a70c071f03", "create_time" : ISODate("2022-01-14T06:41:12.373Z"), "update_time" : ISODate("2022-02-10T01:55:07.317Z") }

/* 3 */ { "_id" : "fe3b2dfe-bc6a-4da2-afd7-9a539d81a99a", "uid" : NumberLong(1805564), "app" : "znt", "device_id" : "be7a3ecf-1f7a-49b6-af78-686af6199bd4", "create_time" : ISODate("2022-01-17T08:23:39.382Z"), "update_time" : ISODate("2022-02-09T08:53:58.685Z") } `

i need to search the data with uid, app and device_id the uid, app and device_id are dosn't need segment, the value need exact match.

my index is here:

ft.create idx_login_token on json prefix 1 login_token: schema $.uid as uid numeric $.app as app text nostem $.device_id as device_id text nostem

at first, i have tried the tag type, but it dos't work, according to the doc, the tag field also segment the content

127.0.0.1:6379> json.set login_token:94cfcc6a-a5ea-4e34-a368-dc9111356aa4 $ '{"uid":2202221832500000077,"app":"znt","device_id":"94f435c7-e36f-42de-8cc5-fb44b2fa3eb8","create_time":"2021-11-26T05:41:13.145Z","update_time":"2022-02-21T02:40:20.241Z"}' OK 127.0.0.1:6379> json.set login_token:2a8e1609-7631-420c-84d7-fd44c9e6342e $ '{"uid":2202221832500000077,"app":"znt","device_id":"1bb4a0b7-ae38-4688-ab8b-d3a70c071f03","create_time":"2021-11-26T05:41:13.145Z","update_time":"2022-02-21T02:40:20.241Z"}' OK 127.0.0.1:6379> json.set login_token:fe3b2dfe-bc6a-4da2-afd7-9a539d81a99a $ '{"uid":1805564,"app":"znt","device_id":"be7a3ecf-1f7a-49b6-af78-686af6199bd4","create_time":"2022-01-17T08:23:39.382Z","update_time":"2022-02-21T02:40:20.241Z"}' OK

i write 3 docs to test

the index seems ok image

the first i search the data via uid, that is a long number, i test the ft.explain idx_login_token '@uid:[2202221832500000077]' ft.explain idx_login_token '@uid:2202221832500000077' ft.explain idx_login_token '@uid:[2202221832500000077 2202221832500000077]'

the last one works image the output point out the number lost the precision, but the result is correct

image

and the exact match rule is not fell well.

the app query works image

but i can't search data via device_id at first, i wrote the query direct image

that indirect that the query treat - as exclude. so i have test the follow, i escape the '-' image

but the result is also empty i quote the query image

the server report query rule error, i quote the data and escape the character '-', but i can't search any thing image

cjdxhjj avatar Mar 07 '22 10:03 cjdxhjj

I found that replacing - with works. So searching with @device_id:1bb4a0b7 ae38 4688 ab8b d3a70c071f03 returns the right results.

michaelbukachi avatar Mar 07 '22 22:03 michaelbukachi

@michaelbukachi that hit the union match rule. that was not the exact match. when another string contains all of that substring, it will match

cjdxhjj avatar Mar 08 '22 01:03 cjdxhjj

@cjdxhjj since you are dealing with UUIDs , that is highly unlikely to happen. For normal strings though, this might not work.

michaelbukachi avatar Mar 08 '22 13:03 michaelbukachi

You need to escape the "-", see https://oss.redis.com/redisearch/Escaping/

kkmuffme avatar Mar 08 '22 15:03 kkmuffme

thanks very much, i will have a try, but i wonder if the platform provider an data type like keyword in es. the string dosn't segment and can exact match

cjdxhjj avatar Mar 09 '22 02:03 cjdxhjj

I'm facing the same issue, I got a JSON document with a UUID but I can't search it by UUID. Maybe a UUID type could avoid this behavior.

goldyfruit avatar Apr 14 '22 23:04 goldyfruit

You need to escape the - when searching as I wrote above. Not that hard guys...

kkmuffme avatar Apr 15 '22 06:04 kkmuffme

You need to escape the - when searching as I wrote above. Not that hard guys...

I never said that it was hard, just that having a dedicated field like UUID could avoid the index process to split the text.

PS: The link from above doesn't work anymore.

goldyfruit avatar Apr 15 '22 12:04 goldyfruit

No, you misunderstand how escaping works. You need to escape BOTH when you index AND when you query. Then it will not split the text and search it as if it were 1 string.

kkmuffme avatar Apr 15 '22 12:04 kkmuffme

No, you misunderstand how escaping works. You need to escape BOTH when you index AND when you query. Then it will not split the text and search it as if it were 1 string.

Ok

goldyfruit avatar Apr 15 '22 13:04 goldyfruit

you need escape before store it and before search it

cjdxhjj avatar Apr 16 '22 02:04 cjdxhjj

You need to escape the "-", see https://oss.redis.com/redisearch/Escaping/

Updated link: https://redis.io/docs/stack/search/reference/escaping/

oshadmi avatar Apr 24 '22 10:04 oshadmi

Would it make sense maybe to add a field type BINARY to the index field types, which omit stemming and tokenization?

Sometimes you are looking for exact string matches, in my example I have a (user supplied) username in my JSON which I would like to index, but I don't want to modify it inside the JSON to escape a (hopefully correct) list of separators, just because this is the only way, I will be able to make searches work.

domoran avatar Jul 31 '22 19:07 domoran

I'm wondering how can I specify a custom analyzer sth like a "KeywordAnalyzer" to avoid tokenizing the text field.

lduffy69 avatar Jul 31 '23 11:07 lduffy69

i had the same issue, what worked for me, was removing the "-" on both indexing and searching

aibarra11 avatar Sep 12 '23 17:09 aibarra11