StackExchange.Redis
StackExchange.Redis copied to clipboard
StringIncrement() returning the same value on two different machines
We are using StackExchange.Redis (v2.0.601) to connect to Azure Redis. We are using StringIncrement() to increment the value of a key while inserting a record into another storage system, and then stop inserting when the Redis value exceeds a certain number. This is running on multiple machines with each service multi-threaded. This is a high load scenario where we are hitting StringIncrement() with hundreds or even thousands of requests per second, spread across multiple machines.
What I am seeing is that we are actually inserting more records than we expect. When I log the values being returned by StringIncrement(), ** I see that we are getting the same value returned on different machines**. Here is the code we are executing:
returnedValue = AzureCache(clusterIndex).StringIncrement(cacheKey, quantity)
The list below is shows a count of the value returned and the count of them. |returned-value-after-increment|count-of-this-value-across-all-machines| So, the value 208 was returned to 4 different threads. The value 3 was returned to 3 different threads.
| 208 | 4 | | 207 | 4 | | 211 | 4 | | 209 | 4 | | 206 | 4 | | 210 | 4 | | 205 | 4 | | 213 | 4 | | 212 | 4 | | 3 | 3 | | 4 | 3 | | 215 | 3 | | 2 | 3
I thought that StringIncrement() is atomic and could never return the same value to two different threads/machines. Can anyone explain what is going on and how to fix this issue? Am I misunderstanding how this should work?
You're not misunderstanding; the increment is meant to be atomic. However, it is also a server-side primitive (incr
/incrby
), so all the library does is invoke that and return the result the server nominates. If something weird is happening: it is server side, so not really a question for "here", but some thoughts:
- is this the only way you mutate this key? Do you ever decrement or set the value to something?
- some kind of node instability? Failover? Split brain?
- Something managed-service specific?
More thoughts:
- Is the key hardcoded? Can you produce a table of result counts that also includes the key?
- Is there an expiration set on the key? Or an eviction policy that's causing the keys to be deleted and then recreated by INCR?
You're not misunderstanding; the increment is meant to be atomic. However, it is also a server-side primitive (
incr
/incrby
), so all the library does is invoke that and return the result the server nominates. If something weird is happening: it is server side, so not really a question for "here", but some thoughts:
- is this the only way you mutate this key? Do you ever decrement or set the value to something? [Kenton] We do decrement in other places of the code, but I verified in the logs that that was not happening during this issue.
- some kind of node instability? Failover? Split brain?
- Something managed-service specific? [Kenton] These are both great questions, which I'm not sure how to investigate. Maybe Philo can help me with this part?
Maybe Philo can help me with this part?
Well, it might be on Azure; but I wouldn't want to conclude prematurely that Azure is doing anything wrong. I imagine that it would help @philon-msft or myself or anyone investigate is: is there some minimal but reliable way of reproducing this?
I was just able to repro it in our INT environment, so that gives me a lot more options for testing. I'll see what I can come up with to package something that is easily repro-able for you.
Awesome. Can I ask: is your INT environment also cloud redis? Or is it on-prem? Just curious as to whether we can rule anything out already.
On Wed, 13 Oct 2021, 19:03 Kenton Gidewall, @.***> wrote:
I was just able to repro it in our INT environment, so that gives me a lot more options for testing. I'll see what I can come up with to package something that is easily repro-able for you.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/1886#issuecomment-942578314, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMCD5AMYH4XRCWGLIRTUGXCXXANCNFSM5FZVCXEA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
My INT environment is also cloud redis.
More thoughts:
- Is the key hardcoded? Can you produce a table of result counts that also includes the key?
- Is there an expiration set on the key? Or an eviction policy that's causing the keys to be deleted and then recreated by INCR?
I added logging to show the cache key that are using and verified that it is always using the same key. I don't believe that there are any expirations or evictions on these. If there were, we would see the keys start over at 1, right? We aren't seeing that. They are consistently increasing, but duplicate incremented quantities are being returned.
@kgidewall I had a thought here: are you using one of the Active-Active Redis multiple-primary hosts?