dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Full-text search errors for prefix match

Open okimonhgb opened this issue 1 year ago • 12 comments

Describe the bug Fulltext search gives error if I want to search for words that match a given prefix: FT.SEARCH idxe:content "@text:football*" "ERR Query syntax error"

To Reproduce Steps to reproduce the behavior:

  1. Insert records : A json type key with a text field called "text".
  2. Create a full-text index: FT.CREATE indxe:content ON JSON PREFIX 1 contents: SCHEMA $.text AS text TEXT
  3. Query records using : FT.SEARCH idxe:content "@text:football*"
  4. It gives error: "ERR Query syntax error"

Expected behavior for exact search I get no error: FT.SEARCH idxe:content "@text:football" My existing app. running on Redis doesn't get any error for prefix matching as expected:

https://redis.io/docs/latest/develop/interact/search-and-query/query/full-text/

... Word prefix You can also search for words that match a given prefix.

FT.SEARCH index "prefix*" ... <<<

Screenshots Error with dragonfly: image

No error with Redis: image

Environment (please complete the following information):

  • OS: Rocky Linux 9.3 (Blue Onyx)
  • Kernel: Linux cache-maker 5.14.0-362.24.1.el9_3.0.1.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 4 22:31:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • Containerized?: No
  • Dragonfly Version: v1.17.1-9d0253ca8937175d2551d9b39c26f99921f46a92

Reproducible Code Snippet FT.CREATE indxe:content ON JSON PREFIX 1 contents: SCHEMA $.text AS text TEXT

FT.SEARCH idxe:content "@text:football*"

Minimal code snippet to reproduce this bug

FT.SEARCH idxe:content "@text:football*"

Additional context Add any other context about the problem here.

okimonhgb avatar May 06 '24 09:05 okimonhgb

@dranikpg can you kindly take a look? :pray:

chakaz avatar May 06 '24 10:05 chakaz

@okimonhgb is it full text search? it's not supported. indeed I do not see that it is clearly specified in the documentation.

romange avatar May 06 '24 10:05 romange

@romange yes it is full text search. It works for exact word searches but seems doesn't support word prefixes. Is it planned to be supported in near future? Is there any workaround? We want to use replace our redis instance with dragonfly but this problem breaks existing codes.

okimonhgb avatar May 06 '24 11:05 okimonhgb

Unfortunately it won't be implemented in near future.

romange avatar May 06 '24 11:05 romange

Ok. I understand there is also no workaround? So we should plan according to this. Can I keep this issue open? Best regards,

okimonhgb avatar May 06 '24 12:05 okimonhgb

Sure :)

romange avatar May 06 '24 12:05 romange

@romange Do we only wish to support prefix ab* queries or do we additionally aim to support infix *ab* and suffix queries *ab efficently. For the former we would need a patricia tree and for the latter a suffix trie. Patricia tree implementation will be very simple as compared to suffix trie which would involve Ukkonens algorithm and multiple edge cases. https://redis.io/docs/latest/develop/interact/search-and-query/query/full-text/ https://redis.io/docs/latest/commands/ft.create/#:~:text=converted%20to%20lowercase.-,WITHSUFFIXTRIE,-for%20TEXT%20and

Lakshyadevelops avatar Sep 25 '24 21:09 Lakshyadevelops

We just discussed this internally within the team - we need to learn the domain and devise a plan.

I personally have never developed patricia tree nor the suffix trie but based on what you write it seems that the Patricia tree will will hit the limit very soon.

Are there any 3rd party libraries that provide suffix trie implementation out of the box?

romange avatar Sep 26 '24 04:09 romange

Also there are suffix arrays. Do they help with this?

romange avatar Sep 26 '24 04:09 romange

All of my conclusions will be based on comparison with Redis.

As per my understanding, We don't need to choose between Patricia Tree or Suffix Trie but rather both implementations need to be present. (If the --WITHSUFFIXTRIE flag is present then make suffix trie only otherwise patricia tree only). Suffix trie consumes a lot more memory so comparatively though both have same complexity. We can support suffix and infix queries on patricia tree also but they would be brute searches only.

I need to do some more research and comparison between suffix tree and suffix array, and find a suitable implementation. I previously extended a Patricia Tree implementation and I am familiar with full text indexing domain so I would like to be included if possible in your discussions.

Lakshyadevelops avatar Sep 30 '24 20:09 Lakshyadevelops

Currently @dranikpg is leading the effort but due to time constraints he is working on it on and off. AFAIK, @dranikpg is planning to use rax.h for that but we have not started any implementation yet. @dranikpg - lets collaborate here before starting the implementation.

romange avatar Oct 01 '24 05:10 romange

  1. Let's not juggle around with terms. Patricia trees are radix tries. "Suffix tree" is just a way to build a trie 🙂
  2. I suggest to use rax from redis as we already rely on it and it's a sufficiently good implementation of what we need
  3. First step would be to implement just prefix matching. It requires some refactoring as we can replace our hashmap index lookup with a trie query to not store all the words twice.
  4. Suffix matches are just queries to the suffix trie. Suffix arrays would not be applicable due to linear update times. Infix matches are an intersection between prefix and suffix matches

dranikpg avatar Oct 05 '24 21:10 dranikpg