badger icon indicating copy to clipboard operation
badger copied to clipboard

Improve Iterator Performance of Seeking with Prefix

Open zzyalbert opened this issue 4 years ago • 11 comments

When I was trying to change the db engine to badgerdb in some of my projects, I found the iterator Seek with prefix was pretty slow in the following situation:

  • lots of keys we were seeking with prefix didn't exist
  • lots of keys have lots of versions

Then I use pprof to found out the iterator was still running parseItem even if the current key was not match the prefix. image

So I fix this by skipping the parseItem process when the current key is not match the prefix.


This change is Reviewable

zzyalbert avatar Jul 02 '21 10:07 zzyalbert

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 02 '21 10:07 CLAassistant

@jarifibrahim Would you please review this pr?

zzyalbert avatar Jul 16 '21 12:07 zzyalbert

Thanks, @zzyalbert for raising the PR. I would need to verify though but I think that this would not work. parseItem is a complex function that does a lot of things. One of them being is calling extra Next on the MergeIterator.

It would be good if we could verify a few test cases like: Consider a table at L5 with the following keys: ax. b and a table at L6 with keys ay, b. Now if we iterate through them using iterator with prefix a, then we should be able to access both ax and ay.

NamanJain8 avatar Jul 19 '21 08:07 NamanJain8

Thanks, @zzyalbert for raising the PR. I would need to verify though but I think that this would not work. parseItem is a complex function that does a lot of things. One of them being is calling extra Next on the MergeIterator.

It would be good if we could verify a few test cases like: Consider a table at L5 with the following keys: ax. b and a table at L6 with keys ay, b. Now if we iterate through them using iterator with prefix a, then we should be able to access both ax and ay.

Theoretically, both ax and ay could be accessed, because this pr only skipped parseItem when inner iterator's current key had no prefix. If we iterated with prefix a when we reached ax, parseItem would also be called because ax still had the prefix a

zzyalbert avatar Jul 19 '21 09:07 zzyalbert

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 22 '21 18:08 stale[bot]

This issue was marked as stale and no activity has occurred since then, therefore it will now be closed. Please, reopen if the issue is still relevant.

stale[bot] avatar Sep 03 '21 02:09 stale[bot]

Any update here? In my application, I need to do iterator Seek() with a prefix, and according to pprof this is responsible for >90% of the total time spent doing range operations. Any optimizations here would be greatly appreciated!

image

AlexMackowiak avatar Sep 21 '21 20:09 AlexMackowiak

Hi @AlexMackowiak , the change looks great to me as far as correctness is concerned. Also, it would be useful for the case where the iterations are made upon the prefixes that have no keys. But in the general case, where the keys actually exist for the prefix, this would be a slight overhead. I would discuss this internally with the team and report back.

Thanks for the PR. :)

NamanJain8 avatar Sep 22 '21 15:09 NamanJain8

@zzyalbert / @AlexMackowiak, can you please add benchmarks for this? If that looks promising, then we can proceed further on the basis of them. I am happy to provide any help that you need to do that.

NamanJain8 avatar Sep 22 '21 17:09 NamanJain8

@NamanJain8 Sorry for the delay, I had some pre-existing benchmark tests for the key-value store my company is building on top of badger. Running these against the two different implementations seems to look promising, especially for reading from large ranges with pagination.

I have attached the raw benchmarking data below, let me know if you need anything else! image Badger PR Benchmark.xlsx

AlexMackowiak avatar Sep 24 '21 22:09 AlexMackowiak

@NamanJain8 Any update on merging here? Or even like a config option for this behavior? The benchmarks I took show that this change would be quite beneficial at least for my company's use case.

AlexMackowiak avatar Nov 18 '21 23:11 AlexMackowiak