dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Search Implementation - Query Time Sorting

Open claridgicus opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe. When using search, on a dataset, where you want to your documents sorted by a property, right now the sort key needs to be part of the index itself. In an e-commerce environment, that's really annoying as the index size becomes prohibitively massive.

500 products sorted into 100 categories (100 indexes of 500 products) Not terrible but quite sizeable

quickly becomes

10,000 products sorted into 1,000 categories (1000 indexes of 10,000 products) Absolutely massive (tens of gb)

Describe the solution you'd like Ideally I would like to implement a sort on a property of my documents at query time, ideally, after the search module has actually retrieved the search results dragonfly would sort the limited subset based on my field (which in my case is a property of an object on my document which is an integer), I understand this will not be as performant as the indexed sort, but the performance comes at a significantly reduced memory overhead.

Describe alternatives you've considered Right now, I make a new index per category, which is blowing out memory usage but far less than an index with 100's of sortable fields.

claridgicus avatar Dec 18 '23 11:12 claridgicus

Hi. Thanks for trying out our new features!

We'll try to add this feature in the nearest feature, probably even in the next version. As you already pointed out, the performance will be much worse, because the values need to be fetched from the entries. But as long as your queries don't match thousands of items, it should be acceptable

dranikpg avatar Dec 19 '23 18:12 dranikpg

@dranikpg

Not to dictate how you build this feature But my suggestion would be that I would declare these sort fields as their own mini index, so that filtering happens on the main index and the sorting is done on the subsidiary index?

Atleast then I wouldn't see such a massive hit to the performance of the query.

Either way, you're my hero

claridgicus avatar Dec 21 '23 07:12 claridgicus

@claridgicus , we are going to implement sort during the query time, i.e. according to your original suggestion:

Ideally I would like to implement a sort on a property of my documents at query time, ideally, after the search module has actually retrieved the search results dragonfly would sort the limited subset based on my field (which in my case is a property of an object on my document which is an integer)

if we build a precomputed index (secondary or not) , it will take space, which will bring us to the original issue you had with 1000 indices taking too much space.

romange avatar Dec 21 '23 08:12 romange

@dranikpg I think you've got exactly my issue.

Right now so you understand my usecase

I have a multitennanted redis cluster (in 4 regions)

For each Tennant

I load in JSON format

  • Every Collection the Tennant has
  • Every Product the Tennant has
  • A view of every product (for some application logic where I implode many references into a composite)

I build

  • A "Search" index which is what I use for my Full Text Search requirements
  • Many "Collection" indexes (this is my problem) which reference their own sort orders

I have many, many, many prefixes

I want an index to partition my tenants data at the boundary of their prefix so I'm not indexing everyone's stuff everywhere.

claridgicus avatar Dec 21 '23 09:12 claridgicus