kvrocks icon indicating copy to clipboard operation
kvrocks copied to clipboard

Using prefix bloom filter in range queries

Open chrisxu333 opened this issue 1 year ago • 8 comments
trafficstars

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Motivation

Prefix seek / prefix bloom filter is proven to be useful when performing range queries of a collection of keys sharing the same prefix. Refer to [https://github.com/facebook/rocksdb/wiki/Prefix-Seek] for more detailed usage. I saw there're some APIs inside storage layer that performs range query logic. They could get benefits from using aforementioned prefix bloom filter.

Solution

I haven't summarized all potential use case of prefix bloom filter, however, FindKeyRangeWithPrefix could definitely benefit from it by configuring the prefix_extractor.

Are you willing to submit a PR?

  • [X] I'm willing to submit a PR!

chrisxu333 avatar Jan 03 '24 02:01 chrisxu333

@git-hulk could you help to see if this proposal is needed / necessary? Thank you :)

chrisxu333 avatar Jan 05 '24 11:01 chrisxu333

I believe we are willing to make such improvements if there are benchmarks that can demonstrate performance enhancements.

PragmaTwice avatar Jan 05 '24 12:01 PragmaTwice

Perhaps, I don't know. IMO, Prefix filter is good for range scan, however, Range in kvrocks is not so widely used...

mapleFU avatar Jan 05 '24 12:01 mapleFU

@chrisxu333 I ever tried to add the prefix bloom filter but seems didn't bring any improvements to Kvrocks, see https://github.com/apache/kvrocks/pull/1367. But as @PragmaTwice said, don't hesitate to have a try if you feel it can improve.

git-hulk avatar Jan 05 '24 13:01 git-hulk

@chrisxu333 I ever tried to add the prefix bloom filter but seems didn't bring any improvements to Kvrocks, see #1367. But as @PragmaTwice said, don't hesitate to have a try if you feel it can improve.

Sounds good I'll give it a try.

chrisxu333 avatar Jan 05 '24 14:01 chrisxu333

Perhaps, I don't know. IMO, Prefix filter is good for range scan, however, Range in kvrocks is not so widely used...

Is that because of the lacking usage scenario or due to the missing support of such command?

chrisxu333 avatar Jan 05 '24 14:01 chrisxu333

What @mapleFU means is we only used range query in a few commands like hgetall/lrange/zrange, etc...

git-hulk avatar Jan 05 '24 15:01 git-hulk

https://github.com/tikv/tikv/blob/2472fd4d85c220b74dc889b493e70bc95dcc75c2/src/config/mod.rs#L963

TiKV uses prefix bf to remove the timestamp column. I think if we'd like to use prefix bf, maybe we need a well-defined rule for prefix. Little commands uses Scan in RocksDB, and mosts commands just uses "Put" and "Get" directly.

However, if benchmark shows it can improve our performance, any contribution is wellcome

mapleFU avatar Jan 05 '24 15:01 mapleFU