RediSearch
RediSearch copied to clipboard
can't search text with '-'
i have some data store in mongodb. the data example is here
` /* 1 */ { "_id" : "94cfcc6a-a5ea-4e34-a368-dc9111356aa4", "uid" : NumberLong(86590), "app" : "znt", "device_id" : "94f435c7-e36f-42de-8cc5-fb44b2fa3eb8", "create_time" : ISODate("2021-11-26T05:41:13.145Z"), "update_time" : ISODate("2022-03-07T07:50:59.572Z") }
/* 2 */ { "_id" : "2a8e1609-7631-420c-84d7-fd44c9e6342e", "uid" : NumberLong(1801978), "app" : "znt", "device_id" : "1bb4a0b7-ae38-4688-ab8b-d3a70c071f03", "create_time" : ISODate("2022-01-14T06:41:12.373Z"), "update_time" : ISODate("2022-02-10T01:55:07.317Z") }
/* 3 */ { "_id" : "fe3b2dfe-bc6a-4da2-afd7-9a539d81a99a", "uid" : NumberLong(1805564), "app" : "znt", "device_id" : "be7a3ecf-1f7a-49b6-af78-686af6199bd4", "create_time" : ISODate("2022-01-17T08:23:39.382Z"), "update_time" : ISODate("2022-02-09T08:53:58.685Z") } `
i need to search the data with uid, app and device_id the uid, app and device_id are dosn't need segment, the value need exact match.
my index is here:
ft.create idx_login_token on json prefix 1 login_token: schema $.uid as uid numeric $.app as app text nostem $.device_id as device_id text nostem
at first, i have tried the tag type, but it dos't work, according to the doc, the tag field also segment the content
127.0.0.1:6379> json.set login_token:94cfcc6a-a5ea-4e34-a368-dc9111356aa4 $ '{"uid":2202221832500000077,"app":"znt","device_id":"94f435c7-e36f-42de-8cc5-fb44b2fa3eb8","create_time":"2021-11-26T05:41:13.145Z","update_time":"2022-02-21T02:40:20.241Z"}' OK 127.0.0.1:6379> json.set login_token:2a8e1609-7631-420c-84d7-fd44c9e6342e $ '{"uid":2202221832500000077,"app":"znt","device_id":"1bb4a0b7-ae38-4688-ab8b-d3a70c071f03","create_time":"2021-11-26T05:41:13.145Z","update_time":"2022-02-21T02:40:20.241Z"}' OK 127.0.0.1:6379> json.set login_token:fe3b2dfe-bc6a-4da2-afd7-9a539d81a99a $ '{"uid":1805564,"app":"znt","device_id":"be7a3ecf-1f7a-49b6-af78-686af6199bd4","create_time":"2022-01-17T08:23:39.382Z","update_time":"2022-02-21T02:40:20.241Z"}' OK
i write 3 docs to test
the index seems ok
the first i search the data via uid, that is a long number, i test the ft.explain idx_login_token '@uid:[2202221832500000077]' ft.explain idx_login_token '@uid:2202221832500000077' ft.explain idx_login_token '@uid:[2202221832500000077 2202221832500000077]'
the last one works
the output point out the number lost the precision, but the result is correct
and the exact match rule is not fell well.
the app query works
but i can't search data via device_id
at first, i wrote the query direct
that indirect that the query treat - as exclude. so i have test the follow, i escape the '-'
but the result is also empty
i quote the query
the server report query rule error, i quote the data and escape the character '-', but i can't search any thing
I found that replacing -
with
works. So searching with @device_id:1bb4a0b7 ae38 4688 ab8b d3a70c071f03
returns the right results.
@michaelbukachi that hit the union match rule. that was not the exact match. when another string contains all of that substring, it will match
@cjdxhjj since you are dealing with UUIDs , that is highly unlikely to happen. For normal strings though, this might not work.
You need to escape the "-", see https://oss.redis.com/redisearch/Escaping/
thanks very much, i will have a try, but i wonder if the platform provider an data type like keyword in es. the string dosn't segment and can exact match
I'm facing the same issue, I got a JSON document with a UUID but I can't search it by UUID.
Maybe a UUID
type could avoid this behavior.
You need to escape the - when searching as I wrote above. Not that hard guys...
You need to escape the - when searching as I wrote above. Not that hard guys...
I never said that it was hard, just that having a dedicated field like UUID
could avoid the index process to split the text.
PS: The link from above doesn't work anymore.
No, you misunderstand how escaping works. You need to escape BOTH when you index AND when you query. Then it will not split the text and search it as if it were 1 string.
No, you misunderstand how escaping works. You need to escape BOTH when you index AND when you query. Then it will not split the text and search it as if it were 1 string.
Ok
you need escape before store it and before search it
You need to escape the "-", see https://oss.redis.com/redisearch/Escaping/
Updated link: https://redis.io/docs/stack/search/reference/escaping/
Would it make sense maybe to add a field type BINARY to the index field types, which omit stemming and tokenization?
Sometimes you are looking for exact string matches, in my example I have a (user supplied) username in my JSON which I would like to index, but I don't want to modify it inside the JSON to escape a (hopefully correct) list of separators, just because this is the only way, I will be able to make searches work.
I'm wondering how can I specify a custom analyzer sth like a "KeywordAnalyzer" to avoid tokenizing the text field.
i had the same issue, what worked for me, was removing the "-" on both indexing and searching