rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Feature request: "Multi" prefix extractor support

Open zaidoon1 opened this issue 1 year ago • 11 comments

say my key format is <account_id>:<user_id>:<some dynamic value>

today, we can create a prefix extractor/bloom on <account_id>:<user_id> to help with queries that start with some known <account_id>:<user_id>, HOWEVER, what we can't do today is ALSO setup a prefix extractor on <account_id> this way, I can use bloom filters on queries that happen to know the account id + user id combination as well as the queries that only happen to have an account id. Effectively, in db/sql terminology, this is like being able to create multiple indexes on the "columns" to optimize queries like: select * from blah where account_id = 123 & select * from blah where account_id = 345 and user_id = 678

As far as I know, today we can only have one prefix extractor/bloom per cf so we have the following workarounds which are not ideal:

  1. create another cf that duplicates the data, so that one cf has <account_id>:<user_id> prefix extractor and the other has <account_id> prefix extractor and depending on the query/what we already know, we will lookup the kv from the corresponding cf. The issue here is we need to use more disk space to store the duplicate data

  2. Given <account_id> is common between both prefix extractors (in this use case) and we always have this, we use this as the prefix extractor, however, we miss on the opportunity to optimize queries that also have <user_id>

zaidoon1 avatar Jul 01 '24 07:07 zaidoon1

looks like something similar was requested https://groups.google.com/g/rocksdb/c/bb6Db8Y3xwU

zaidoon1 avatar Jul 02 '24 04:07 zaidoon1

@ajkr What do you think about a feature like this? It seems like it's very useful/high impact, but i'm not sure the level of effort is?

zaidoon1 avatar Jul 02 '24 04:07 zaidoon1

Can @pdillinger's key segment filtering (https://github.com/facebook/rocksdb/blob/110ce5f4a392d02167cee3439160f83d2929a2c8/include/rocksdb/experimental.h#L64-L163, #12075) be used for this purpose?

ajkr avatar Jul 08 '24 23:07 ajkr

oh interesting, I didn't know this exists, I'll take a closer look at how this works. Is this being used in production right now anywhere? Any gotchas?

zaidoon1 avatar Jul 09 '24 03:07 zaidoon1

So reading this:

To simplify satisfying some filtering requirements, the segments must encompass a complete key prefix (or the whole key) and segments cannot overlap.

Specifically, the segments cannot overlap part means this won't work for my use case (unless I'm misunderstanding). So to use the terminology being used here, given a key of the form <account_id>:<user_id>:<some dynamic value>, I would like to create two segments for filtering: <account_id> & <account_id>:<user_id> given that both segments share the <account_id> part, this means the two segments are "overlapping" and therefore are not allowed right now?

zaidoon1 avatar Jul 09 '24 08:07 zaidoon1

or.. actually maybe the whole point is to use the category concept? So I can have one category that contains two segments:

<account_id> & <user_id> and then I can do the filtering by "category" to satisfy queries like: select * from blah where account_id = 345 and user_id = 678 or I can do the filtering by "segment" (specifically the account_id segment) to satisfy queries like select * from blah where account_id = 123?

zaidoon1 avatar Jul 09 '24 08:07 zaidoon1

also per https://github.com/facebook/rocksdb/blob/v9.3.1/include/rocksdb/experimental.h#L334-L335 how does the filter that is being used here compare to bloom/ribbon perf wise, etc.. any benchmarks, etc..?

I think this is exactly what I need but would love more examples and I will likely wait until bloom/ribbon filters are supported

zaidoon1 avatar Jul 09 '24 08:07 zaidoon1

The API and functionality is not yet complete for the filtering you want, but the KeySegmentsExtractor API is intended to be complete.

Specifically, the segments cannot overlap part means this won't work for my use case (unless I'm misunderstanding)

You want a segment for each field in your key. This should be stable regardless of your desired filtering strategy (except when you extend or replace your key schema). You want a Bloom/ribbon filter on SelectKeySegment(0) and a Bloom/ribbon filter on SelectKeySegmentRange(0,1). Creating Bloom/ribbon filters is not yet available in the API:

https://github.com/facebook/rocksdb/blob/9.2.fb/include/rocksdb/experimental.h#L334-L335

pdillinger avatar Jul 09 '24 19:07 pdillinger

got it, thanks for confirming! It's great that what I'm looking for is being worked on. Is there an existing issue that tracks the rest of this work that I can track or should I just keep this issue open?

zaidoon1 avatar Jul 10 '24 01:07 zaidoon1

@pdillinger sorry, just wanted to confirm,, I might have made an incorrect assumption. Based on what you said: The API and functionality is not yet complete for the filtering you want,, Is it safe to assume the filtering I want is planned to be done or is this not something that is planned/being prioritized?

zaidoon1 avatar Jul 18 '24 19:07 zaidoon1

I am also very much interested in this feature as I have a similar use case.

JamesA212 avatar Aug 28 '24 20:08 JamesA212