redis icon indicating copy to clipboard operation
redis copied to clipboard

Adding wt option in setCommand

Open charsyam opened this issue 12 years ago • 20 comments

1] spec

  • SET has a new option WT, it only will set the key if the new weight is > previous weight with wt option
    • if tx_time value is larger than current weight it will return +OK
    • otherwise return -ERR
    • but it can always update key's value without "wt" option.
    • we should give weight when we use wt option.
set [key] [value] wt [weight]
  • Every key has a new property called "tx" like as "expire"
    • we can get tx value using txCommand
    • if key exists but it doesn't have tx_time, it returns 0
    • if key doesn't exist, it returns -1
    • return tx_time
weight [key]

2] weight

  • we can use db operation timestamp as weight.
  • or we can get global timestamp from other server(such as redis)

3] usage example

set a 123 wt 1000000
+OK
set a 123 wt 10000 <--- it only will set the key with larger weight
-ERR
weight a
1000000
set a 123 wt 2000000
+OK
weight b
-1
set c 123
+OK
weight c
0
set a 234  <---- exception case.
+OK

4] Why this feature is needed?

  • when someone wants redis to use a cache, They will do them following these sequence.

    1. read db
    2. write redis
  • Normal Scenario.

    1. Process A - Read key A(1) from DB at 1ms
    2. Other Proceess - Write key A(2) to DB.
    3. Process B - Read key A(2) from DB at 3ms
    4. Process A - Write set A 1 to Redis at 4ms
    5. Process B - Write set A 2 to Redis at 5ms
  • Race Condition Scenario. Key(Value)

    1. Process A - Read key A(1) from DB at 1ms
    2. Other Proceess - Write key A(2) to DB.
    3. Process B - Read key A(2) from DB at 3ms
    4. Context-Switching in Process A or Sleep happens.
    5. Process B - Write set A 2 to Redis at 4ms
    6. Process A - Write set A 1 to Redis at 5ms
  • Race Condition. consistency is broken.

  • Using set wt Scenario.

    1. Process A - Read key A(1) from DB at 1ms
    2. Other Proceess - Write key A(2) to DB.
    3. Process B - Read key A(2) from DB at 3ms
    4. Context-Switching in Process A or Sleep happens.
    5. Process B - Write set A 2 wt 3 to Redis at 4ms
    6. Process A - Write set A 1 wt 1 Redis at 5ms
  • but "set A 1 wt 1" failed. it can keep consistency.

5] drawback

  1. it spends more memory to keep db->weights
  2. RDB saving and loading is changed. so previous version redis can't read new rdb.

I believe this feature is very useful when we design cache layer. Thank you.

charsyam avatar May 16 '13 00:05 charsyam

This is a cool feature what I've expected! Thank you. And I'd like to make a suggestion. Why don't you clarify that 'tx' can be an any number(even a negative number) or only a timestamp number? Considering its purpose, I think that it's not a fault that 'tx' can hold a negative number or a small bit that is for saving memory.

flutia avatar May 16 '13 06:05 flutia

@charsyam, I like this feature, but adding this feature could cause a lot of problems. First, since you add nx to set* command , what about sadd,zadd, etc. Second, how to get tx(i think tx is timestamp), who get it, for example, there are two clients A, B,but machine A & machine B have inaccurate time clock. what will happen? how to solve it? That's my personal opinion. thx.

AllenDou avatar May 16 '13 06:05 AllenDou

Could you please explain what this does? The code lacks any documentation and the commit message is useless. What does tx stand for?

badboy avatar May 16 '13 11:05 badboy

@flutia I think it is better not to allow less than 0 value. and 0 means it has no tx time, -1 means there is no key. Thank you :)

@AllenDou I added txCommand that can get tx time from key. and I don't consider zadd and sadd because set's other options also don't support (set or sorted set). yes. tx means timestamp and it always uses extra parameter from client. for example,

set a 123 tx 100000 <- this is tx time

and I append txCommand. I also think it is needed.

and inaccurate time doens't matter. because tx-time for db's operation time and client send those kind of values.

@badboy for example. There are some processes: they read db and update cache. please see the above scenario. it can prevent those kind of scenarios. To avoid race condition in updating cache in two processes. I think this feature will be useful.

charsyam avatar May 16 '13 12:05 charsyam

I still don't fully get it. So the timestamp refers to the time the set is written, so that any other client SETting the same key with a timestamp before that will fail? Or is it that the key is locked until that timestamp?

badboy avatar May 16 '13 12:05 badboy

Dudes, sorry but there is no description at all about what is the semantics of TX. Please write what it does :-)

Like:

  • Every key has a new hidden property called "tx".
  • SET has a new option TX, it only will set the key if the previous value is <= the hidden property.

And so forth. Not sure if the above is correct at all, just an example.

antirez avatar May 16 '13 12:05 antirez

@flutia @AllenDou @badboy @antirez Thank you, guys. I updated this issue description. feel free to ask about this feature.

charsyam avatar May 16 '13 14:05 charsyam

@charsyam Ok, I fully agree with you. If zero means that a key has no tx time, allowing negative number can cause confusion. I think that positive numbers are enough in most cases. :)

Let me explain about this issues. There is a common pitfall when one constructs a DB-and-set-cache design. While a thread that have updated some values to DB are suspending (and does not set the cache with the values yet), another thread can update the DB with new values, and done to set cache with new values. As a result, DB has the new values but the cache holds the old values that the former thread set.

To avoid that situation, for example, memcached provides a 'cas' operation. The operation works well, but one should fetch(get()) a key first. So, it requires two network steps and the steps can be a unnecessary burden in simple DB-and-set-cache case.

I think that a timestamp fetched from a DB is eligible for 'tx'. However someone who want use this feature will select their own value for 'tx'. :)

flutia avatar May 16 '13 14:05 flutia

Thanks, @charsyam, now the idea is clear.

badboy avatar May 16 '13 14:05 badboy

+1

k8nx avatar May 23 '13 02:05 k8nx

I think we can do this with Lua scripting as shown below though it doubles the number of keys.

local ts = redis.call('get', KEYS[2]) or -1
if tonumber(ARGV[2]) > tonumber(ts) then
  redis.call('mset', KEYS[1], ARGV[1], KEYS[2], ARGV[2])
  return ARGV[1]
else
  return {err="Stale"}
end
> evalsha 3715540b1dead19e7d3fe146e87bb5d157e41c81 2 k k:ts hello 100
"hello"
> evalsha 3715540b1dead19e7d3fe146e87bb5d157e41c81 2 k k:ts world 200
"world"
> evalsha 3715540b1dead19e7d3fe146e87bb5d157e41c81 2 k k:ts goodbye 150
(error) Stale

What do you think?

junegunn avatar May 23 '13 08:05 junegunn

@junegunn I think it has some troubles. for example if you use memory policy, at that time, k:ts can be removed. so I think it is better tx attribute to become key attribute such as expire. and other guys can easily remove it without any intension.

charsyam avatar May 23 '13 08:05 charsyam

@charsyam Good point. But what if we just pack them into a single list? Then we don't have to worry about separate eviction of those two.

local ts = redis.call('lindex', KEYS[1], 1) or -1
if tonumber(ARGV[2]) > tonumber(ts) then
  redis.call('del', KEYS[1])
  redis.call('rpush', KEYS[1], ARGV[1], ARGV[2])
  return ARGV[1]
else
  return {err='Stale'}
end

junegunn avatar May 23 '13 08:05 junegunn

@junegunn Thank you for your interesting. I think this way is also possible. but, first of all, It needs more overhead for operation. second, I don't make sure that using another structure is good way to provide this feature. Don't make me wrong. maybe second reason is my own philosophy. Thank you for your interesting again. :) If you can I want to talk with you in Korean. :)

charsyam avatar May 23 '13 08:05 charsyam

Ok. I think the question here is: "Is it worth it?" Is it worth it to integrate this feature into the core, when what it tries to achieve is already not impossible with the current set of features. Should we make this certain pattern of usage easier by extending the core, or should we just keep the core simple? Obviously I'm not the one to decide that, but I'm leaning towards the latter.

(You can drop me an e-mail if you'd like, the address can be found on my github page. Thanks.)

2013/5/23 charsyam [email protected]

@junegunn https://github.com/junegunn Thank you for your interesting. I think this way is also possible. but, first of all, It needs more overhead for operation. second, I don't make sure that using another structure is good way to provide this feature. Don't make me wrong. maybe second reason is my own philosophy. Thank you for your interesting again. :) If you can I want to talk with you in Korean. :)

— Reply to this email directly or view it on GitHubhttps://github.com/antirez/redis/pull/1109#issuecomment-18331285 .

cheers, junegunn choi.

junegunn avatar May 23 '13 09:05 junegunn

@junegunn If you just ask my opinion? I make sure that it is worth to do. As you said, we can do many things using scripting. But we also consider its.performance and usescases. When I designed cache layer this feature is important. And facebook or other big company uses this feature. So it should be simple and fast. Scripting is.slower and using collections spend much memeory and call more function. So it is.better this feature is merged into core. Yes, core should be simple. And i.think this.feature doesn,t break simpleness of core.

charsyam avatar May 23 '13 10:05 charsyam

@antirez, @flutia, @AllenDou, @badboy, @dgkim84, @junegunn I tested this feature through comparing native and lua scripting.

totally, I tested each case for 4 times. and it tested 100,000 times per test

set 100,000 keys
--speed(second)(native wins)
     | native | lua
1   | 15.809| 23.130 
2   | 15.737| 23.165
3   | 15.544| 23.819
4   | 15.590| 23.724

--memory(lua wins: using other containers)
     | native | lua
1   | 12.02M| 11.17M 
2   | 12.02M| 11.17M 
3   | 12.02M| 11.17M 
4   | 12.02M| 11.17M 

I used this test generating script: https://gist.github.com/charsyam/5638416

To my surprise, Using other containers uses less memories than dictEntry as set-wt attribute.(7~8%) but set-tx native approach is faster than lua-scripting(46~53%)

and I increased test case to test 1,000,000 times per test

set 1,000,000 keys
--speed(second)(set wt native wins)
     | native | lua
1   | 2m37  | 3m59 
2   | 2m39  | 3m58

--memory(set wt native wins)
     | native | lua
1   | 108M  | 207M 
2   | 108M  | 207M 

Yes, set wt is 50% faster and uses 50% memories compare to lua scripting. so I think set-wt native implementation is better than using lua scripting. Thank you.

feel free to point out my mistakes :) Thank you.

charsyam avatar May 23 '13 19:05 charsyam

good!

AllenDou avatar May 24 '13 02:05 AllenDou

I changed tx to wt and weight. because it is more meaningful compare to before.

charsyam avatar May 24 '13 03:05 charsyam

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Mar 24 '24 23:03 CLAassistant