speedb icon indicating copy to clipboard operation
speedb copied to clipboard

Add "getSmallest()" method (and maybe "isEmpty")

Open mjsax opened this issue 1 year ago • 14 comments

In Kafka Streams, we have a few cases for which we use RocksDB as a "priority queue" -- we layout the keys in a specific way that gives us the order we want. To "peek" into the head of the "queue", we need to create an iterator, what is very expensive.

Because iterators are very expensive, we actually do this "peek" less frequently as we would like to, what leads to a few complications and degraded user experience.

If we could do a point-lookup like getSmallest and RocksDB could give us a quick answer (maybe need to return null if the "queue" is empty), it might help us to cleanup our code (and get rid if internal workaround config to "throttle" how often we create an iterator right now), and could help to improve user experience.

Without knowing the code base, it seems not to be impossible to implement a efficient LSM search with no input key, that looks for the smallest key in the LSM, and returns the corresponding key and value (because we don't pass in the key, we would need the key to be returned as well -- it might also be ok if we only get the key, as we could do a consecutive get(key) to retrieve the corresponding value).

To avoid the getSmallest call, it would also be good to have an efficient isEmpty method (of course, if isEmpty would be as expensive as getSmallest it might not help to add isEmpty -- not sure if isEmpty could be implemented more efficiently, ie, without any LSM search)

Of course, if there is an overall better way to use Speedb as a priority queue, happy to follow recommendations.

mjsax avatar Oct 09 '23 03:10 mjsax

We can improve GetSmallest. the iterator creation is expensive and there is no need for a heap. If needed we can parallelised it (same as MultiGet) so that disk requests will be async. Please note that in this case I do recommend using a separate CF for each queue (assuming you have limited number of queues) .

hilikspdb avatar Oct 09 '23 05:10 hilikspdb

@erez-speedb @ayulas - Please discuss performance testing scenarios

Guyme avatar Oct 30 '23 13:10 Guyme

@erez-speedb, @ayulas - As agreed, you should update the issue with performance testing scenarios and associated information.

udi-speedb avatar Nov 01 '23 09:11 udi-speedb

@udi-speedb The comment above your comment by @Guyme already point that

ayulas avatar Nov 01 '23 10:11 ayulas

@udi-speedb The comment above your comment by @Guyme already point that

Almost, but not quite. Anyway, please update the issue as agreed. Thanks

udi-speedb avatar Nov 01 '23 11:11 udi-speedb

Here is the performance tests we need to compare: seek with writes with next=0 compared to "get smallest" . Ayelet please add GetSmallest (with writes) tests to db_bench.

hilikspdb avatar Nov 01 '23 11:11 hilikspdb

@hilikspdb, @ayulas - Are you implementing both getSmallest() and IsEmpty()?

udi-speedb avatar Nov 01 '23 12:11 udi-speedb

is-empty is mapped into getsmallest which return an element out of range (or return empty data)

hilikspdb avatar Nov 01 '23 12:11 hilikspdb

We can also add the options "prefix_same_as_starts" which will return "not-found" if there is no object that matches the prefix (@ayulas)

hilikspdb avatar Nov 01 '23 14:11 hilikspdb

@ayulas - make sure you add this capability into db_bench as a part of this ticket so @erez-speedb can test it.

Guyme avatar Nov 05 '23 09:11 Guyme

i divided this to 2 phases. phase 1 will be a simple code which will reduce the heap needed and steps while creating iterators to get smallest key+value phase 2 will improve that by not needed the full iterator creation (in phase 1 it will be created internally - the user will not be exposed to it) as similar to the hmap implementation we will use the range of each level as written in md . this is another project but will be very useful

ayulas avatar Nov 06 '23 11:11 ayulas

@mjsax Hi, a few preliminary questions regarding your exact requirements (we may provide an API that supports additional capabilities, but would like to verify yours):

  • Column families: Is the smallest key in the context of a specific cf? All cf-s? A subset of cf-s?
  • Snapshots: Do you need support for snapshots in this service?
  • You are describing the use of iterators as expensive. Could you give some number to show what expensive is and how much cheaper the getSmallest() service needs to be so it would serve its purpose?

udi-speedb avatar Nov 23 '23 10:11 udi-speedb

here is the case as described . the user would like to use priority queue, meaning get smallest , process, delete. The queue may be identified by a prefix. The current way (new iterator , seek, delete iterator )may take a long time . iterator create and the problem with delete. we would like to see if we can optimise get smallest to do a skip to the next key with data more efficient. this is related to the iterator after massive delete that Or is working on and may also be a base to improvement there (if the result will be good enough)

hilikspdb avatar Nov 23 '23 12:11 hilikspdb

here is the case as described . the user would like to use priority queue, meaning get smallest , process, delete. The queue may be identified by a prefix. The current way (new iterator , seek, delete iterator )may take a long time . iterator create and the problem with delete. we would like to see if we can optimise get smallest to do a skip to the next key with data more efficient. this is related to the iterator after massive delete that Or is working on and may also be a base to improvement there (if the result will be good enough)

@hilikspdb - Thanks for the clarification. I would still love to get the exact requirements from @mjsax

udi-speedb avatar Nov 23 '23 18:11 udi-speedb