rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

is it possible to delete a specific key value pair?

Open zaidoon1 opened this issue 1 year ago • 1 comments

i have a cf that acts like a queue, and two services that read from this cf. One service (service a) does a GET to do something with the kv and another service (service b) that iterates over each kv in the queue, does some I/O and then delete the KV.

The value of the key is a timestamp. The following race condition can happen:

  1. kv is written to queue cf. let's say key = hi, value = 123
  2. service b starts fetching the kvs, and starts processing the kv from step 1
  3. service b is done processing, and is about to send a delete for the kv
  4. the key value is updated with a new timestamp, so key = hi, value = 456 now
  5. service b sends the delete request for key = hi because as far as it's concerned it just finished processing the kv (hi, 123)
  6. rocksdb deletes key = hi, value = 456
  7. service b will never see the kv with value 456 and we lost a job

at first glance, it seems like we can have the key include some id that keeps incrementing (so (1:hi, 123) and (2:hi, 456)) as explained in https://github.com/facebook/rocksdb/wiki/Implement-Queue-Service-Using-RocksDB , however, because of service A that does a GET for existing kvs because it happens to know the key, the GET will fail since it won't know the random id that keeps incrementing. Another potential way around this is to have the random id at the end of the key and convert the GET to a prefix lookup so service A can do a prefix lookup of "hi" to ignore the fact that the key is now hi:1, however, service A heavily relies on MultiGET as it fetches multiple keys (can be as many as 100 keys in one MultiGet) and converting this to prefix lookups sounds like it would be bad for performance.

Given this problem, is it possible to tell rocksdb delete the following kv pair: (hi, 123) so that if the kv is updated to be (hi, 456), rocksdb skips the delete all together? If that's not possible, does it sound like a good feature request? If not, how bad from a perf perspective is the multiple prefix lookups and is there a way to make it less bad? Maybe some MultiSeek or something like that?

TLDR: in RDMS, we can do something like delete from x where key = blah and value = blahagain;. How can this be done in rocksdb?

zaidoon1 avatar Aug 08 '24 08:08 zaidoon1

@ajkr is there a way to accomplish what I need or would this be a feature request?

zaidoon1 avatar Aug 12 '24 04:08 zaidoon1

Can you try to rewrite your question to make it clearer what you are trying to achieve, can you also make the steps that you have for the "race condition" clearer.

For example, at the moment step 7 talks about 'service b', but should that actually be 'service a'?

adamretter avatar Sep 16 '24 09:09 adamretter

so basically i have 3 services total:

  1. service A only does GETs from the queue cf
  2. service B is responsible for iterating over each kv in the queue, "processing it" and then deleting the processed kv
  3. service C PUTs kvs into the queue cf when an external actor tells it to

race condition wise, we can ignore service A, it's just doing GETs, but it adds a constraint that we want to fetch thousands of keys per second from the queue (most will not match any kvs). This is currently being done using Multiget as I fetch around 30 keys at a time and that's not counting concurrent requests (i.e multiple requests each fetching 30 at a time, etc..). Total I can be fetching 1500 kvs per second

the race condition exists between service C and Service B, if Service B pulls kv ( hi, 123) from the queue to process it, and while that is happening, Service C PUTs (hi, 567) and then Service B is done processing and processed to delete the key "hi" thinking it's deleting (hi, 123). What ends up getting deleted is the now updated kv (hi, 567). When Service B tries to pull the new kv to process, it won't ever see (hi, 567) since that was deleted by "accident" when it tried deleting (hi, 123). I need to make sure Service B sees every unique kv pair being written by Service C

I'm aware of https://github.com/facebook/rocksdb/wiki/Transactions but I don't want to deal with the overhead as the system i've built already handles race condtiions/concurrency without needing transactions except for this specific part of the system so I'm trying to figure out a way to avoid introducing transactions to the entire system when it's not needed for like 99% of the system.

zaidoon1 avatar Sep 16 '24 10:09 zaidoon1

a simple workaround without using transactions is updating the key schema from:

key = <hi> -> key = <hi>:<write timestamp>

This works nicely for Service B that just iterates over all the kvs. However, this becomes an issue with service A, since it won't know the timestamp, so now instead of using MultiGet, I'll need to switch it to a prefix Get so it can prefix GET "hi:". If each request from Service A is pulling 30 keys right now which is one MultiGet call, that will become 30 prefix GETs and then we multiply that by the number of concurrent requests from Service A (say 50), this will be 1500 prefix Gets at the same time being called using threads. From a perf perspective, this is not sustainable?

zaidoon1 avatar Sep 16 '24 10:09 zaidoon1

@zaidoon1 This seems like a general concurrency problem to me. It is likely you will need to either lock or copy key/values, or perhaps use a different key scheme.

I think the Scenario you described is basically this:

  1. Service C - PUT(hi,123)
  2. Service B - GET(hi)
  3. Service C - PUT(hi,567)
  4. Service B - DELETE(hi)
  5. Service B - GET(hi)

In step (5) service B does not get any value (i.e. it never sees hi,567), because it previously deleted it.

However, aren't there quite a few race conditions that you have in your system. For example, wouldn't this also be a problem:

  1. Service C - PUT(hi,123)
  2. Service C - PUT(hi,567)
  3. Service B - GET(hi)
  4. Service B - DELETE(hi)
  5. Service B - GET(hi)

In the above your Service C overwrites the key/value at step (2) and so Service B will never see hi,123.

adamretter avatar Sep 16 '24 11:09 adamretter

In step (5) service B does not get any value (i.e. it never sees hi,567), because it previously deleted it.

that's correct

However, aren't there quite a few race conditions that you have in your system. For example, wouldn't this also be a problem:

for my particular use case and without going into the implementation details, the value is basically a ts to do some work, as long as I see the latest/newest ts then it doesn't matter if if Service C does PUT(hi, 123) then PUT(hi, 124) as long as Service B sees (hi, 124) (i.e the latest write)

I've decided to go a different direction and changed the schema so that it avoid the race condition and used something else for service A to avoid the multiget issue with the new schema format.

Given it's not possible to delete a specific kv pair and this feature won't likely be implemented. I'll close this ticket.

Thanks @adamretter

zaidoon1 avatar Sep 17 '24 03:09 zaidoon1