dragonfly
dragonfly copied to clipboard
Full-text search errors for prefix match
Describe the bug Fulltext search gives error if I want to search for words that match a given prefix: FT.SEARCH idxe:content "@text:football*" "ERR Query syntax error"
To Reproduce Steps to reproduce the behavior:
- Insert records : A json type key with a text field called "text".
- Create a full-text index: FT.CREATE indxe:content ON JSON PREFIX 1 contents: SCHEMA $.text AS text TEXT
- Query records using : FT.SEARCH idxe:content "@text:football*"
- It gives error: "ERR Query syntax error"
Expected behavior for exact search I get no error: FT.SEARCH idxe:content "@text:football" My existing app. running on Redis doesn't get any error for prefix matching as expected:
https://redis.io/docs/latest/develop/interact/search-and-query/query/full-text/
... Word prefix You can also search for words that match a given prefix.
FT.SEARCH index "prefix*" ... <<<
Screenshots
Error with dragonfly:
No error with Redis:
Environment (please complete the following information):
- OS: Rocky Linux 9.3 (Blue Onyx)
- Kernel: Linux cache-maker 5.14.0-362.24.1.el9_3.0.1.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 4 22:31:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- Containerized?: No
- Dragonfly Version: v1.17.1-9d0253ca8937175d2551d9b39c26f99921f46a92
Reproducible Code Snippet FT.CREATE indxe:content ON JSON PREFIX 1 contents: SCHEMA $.text AS text TEXT
FT.SEARCH idxe:content "@text:football*"
Minimal code snippet to reproduce this bug
FT.SEARCH idxe:content "@text:football*"
Additional context Add any other context about the problem here.
@dranikpg can you kindly take a look? :pray:
@okimonhgb is it full text search? it's not supported. indeed I do not see that it is clearly specified in the documentation.
@romange yes it is full text search. It works for exact word searches but seems doesn't support word prefixes. Is it planned to be supported in near future? Is there any workaround? We want to use replace our redis instance with dragonfly but this problem breaks existing codes.
Unfortunately it won't be implemented in near future.
Ok. I understand there is also no workaround? So we should plan according to this. Can I keep this issue open? Best regards,
Sure :)
@romange
Do we only wish to support prefix ab* queries or do we additionally aim to support infix *ab* and suffix queries *ab efficently. For the former we would need a patricia tree and for the latter a suffix trie. Patricia tree implementation will be very simple as compared to suffix trie which would involve Ukkonens algorithm and multiple edge cases.
https://redis.io/docs/latest/develop/interact/search-and-query/query/full-text/
https://redis.io/docs/latest/commands/ft.create/#:~:text=converted%20to%20lowercase.-,WITHSUFFIXTRIE,-for%20TEXT%20and
We just discussed this internally within the team - we need to learn the domain and devise a plan.
I personally have never developed patricia tree nor the suffix trie but based on what you write it seems that the Patricia tree will will hit the limit very soon.
Are there any 3rd party libraries that provide suffix trie implementation out of the box?
Also there are suffix arrays. Do they help with this?
All of my conclusions will be based on comparison with Redis.
As per my understanding, We don't need to choose between Patricia Tree or Suffix Trie but rather both implementations need to be present. (If the --WITHSUFFIXTRIE flag is present then make suffix trie only otherwise patricia tree only). Suffix trie consumes a lot more memory so comparatively though both have same complexity. We can support suffix and infix queries on patricia tree also but they would be brute searches only.
I need to do some more research and comparison between suffix tree and suffix array, and find a suitable implementation. I previously extended a Patricia Tree implementation and I am familiar with full text indexing domain so I would like to be included if possible in your discussions.
Currently @dranikpg is leading the effort but due to time constraints he is working on it on and off. AFAIK, @dranikpg is planning to use rax.h for that but we have not started any implementation yet. @dranikpg - lets collaborate here before starting the implementation.
- Let's not juggle around with terms. Patricia trees are radix tries. "Suffix tree" is just a way to build a trie 🙂
- I suggest to use rax from redis as we already rely on it and it's a sufficiently good implementation of what we need
- First step would be to implement just prefix matching. It requires some refactoring as we can replace our hashmap index lookup with a trie query to not store all the words twice.
- Suffix matches are just queries to the suffix trie. Suffix arrays would not be applicable due to linear update times. Infix matches are an intersection between prefix and suffix matches