bottleneck icon indicating copy to clipboard operation
bottleneck copied to clipboard

Getting a lots of NOSCRIPT Reply errors after Redis has been turned off and on then

Open scherbenokk opened this issue 5 years ago • 10 comments

Hello, guys) I'm getting a lots of errors with the following content after a Redis instance has been turned off and then turned on:

ReplyError: NOSCRIPT No matching script. Please use EVAL.
at parseError (/usr/src/app/node_modules/redis-parser/lib/parser.js:193:12)
at parseType (/usr/src/app/node_modules/redis-parser/lib/parser.js:303:14)
command: 'EVALSHA',
args:
[ 'ffd7c103c93f29419aa15d45f66f35f591034d6d',
8,
'b_group-key-temp_settings',
'b_group-key-temp_job_weights',
'b_group-key-temp_job_expirations',
'b_group-key-temp_job_clients',
'b_group-key-temp_client_running',
'b_group-key-temp_client_num_queued',
'b_group-key-temp_client_last_registered',
'b_group-key-temp_client_last_seen',
1553612008193,
'fe8rlix2uy',
3 ],
code: 'NOSCRIPT' }

Also, I can't handle the case when Redis has been turned off and then turned on - how can I continue a limiter execution(from the code)? I tried to handle Redis connection events but it's still impossible to revert correct execution of the limiter(

scherbenokk avatar Mar 26 '19 14:03 scherbenokk

Hi, thanks for opening an issue.

This is happening because the Redis Scripts are not present after the instance comes back. From the Redis docs:

Also, as already mentioned, restarting a Redis instance flushes the script cache, which is not persistent.

At the moment, you'd have to create a new limiter to load the scripts because they are only loaded when Bottleneck first connects. Why is the Redis server being shutdown? Knowing why it happens will help me understand how your problem can be fixed.

Bottleneck could detect NOSCRIPT errors and load the scripts again but that solution might have poor performance in large clusters 🤔

SGrondin avatar Mar 26 '19 19:03 SGrondin

Thank you so much for your help. If we only lose connection everything works fine, that's really cool) Regards to turning off a Redis instance - as the simple case it may be a restart of the machine(but it may be very rare). I only would like to check limiters behavior on a temporary connection loss. But do we have any(for now performance doesn't matter) approach to fix the issue with the Redis instance restart? Because of recreation of a limiter means loss of the tasks that are contained in the limiter( Thank you in advance)

scherbenokk avatar Mar 27 '19 10:03 scherbenokk

I've been thinking and I'm now in favor of implementing this feature. I'll need some time to test/handle all possible scenarios and race conditions. I think it's possible to design it to look just like a connection reset.

SGrondin avatar Mar 28 '19 18:03 SGrondin

After doing more research on this, I've confirmed that this can be done cleanly and it won't take too much time to write this feature. There are serious side effects to restarting the server, though:

  1. If Redis is not using Persistence, then the Bottleneck cluster will come back cleanly, but all the jobs currently in the RUNNING and EXECUTING states will not count towards the maxConcurrent, minTime, reservoir, etc, limits. Additionally, the reservoir will be reset to its original value, minTime will think the next valid timestamp to start a request is now, etc etc. Basically, this means the user's app will temporarily be allowed to exceed the rate limits.
  2. If Redis is using some Persistence, then the Bottleneck cluster will be in an inconsistent state, because it didn't lose all of its data, but it lost the last few seconds or minutes. Strange things will happen, the data is corrupted.
  3. If Redis is using maximum Persistence, then everything will be fine. The data will be consistent and things will keep working as expected. Maximum Persistence has a very serious impact on performance of the Redis server though, even with an SSD it could be too slow to work properly.

I recommend reading this page: Redis Persistence. Considering the side effects listed above, is it still worth it to develop this feature? In software, doing the wrong thing is almost always worse than not doing anything. If Bottleneck starts supporting Redis server restarts, users will not know that their Redis server crashed and restarted, they won't know all of the subtle side effects, they won't know it's the reason their app made too many requests and that's why they were blacklisted from an API, etc. It's complicated. Maybe Bottleneck should only support Redis server restarts if their Redis configuration is set to maximum persistence?

I'd like to hear your thoughts on this. Thank you!

SGrondin avatar Apr 14 '19 16:04 SGrondin

I've found a way to detect case 2. and trigger case 1. instead. 💡

SGrondin avatar Apr 14 '19 20:04 SGrondin

doing the wrong thing is almost always worse than not doing anything

^Flashbacks and shivers down my spine. Should contain a trigger warning 😆

May I just suggest you add a global Bottleneck.on('connectionDropped') event to be able to trigger alerts? I think in most use cases it would be a major issue in urgent need of maintenance, so a global event emitter should cut it instead of lots of limiter.on('error', handleError) or group.on('error', handleError) (not sure this one works or not)

TheGame2500 avatar Apr 16 '19 14:04 TheGame2500

May I just suggest you add a global Bottleneck.on('connectionDropped') event to be able to trigger alerts?

Totally agree, something like that is required. At the moment the majority of errors will be coming back to the user through failing jobs. I think a Cluster-wide event is a good idea.

I was brainstorming this issue with a coworker, and I think this feature will need a setting to let users decide between RESET_STATE and THROW_ERRORS or something like that. I don't think there's one size fits all.

SGrondin avatar Apr 18 '19 15:04 SGrondin

The plot thickens... ioredis already transparently reloads the scripts, which only means it will silently do the wrong thing if the server didn't shutdown cleanly.

@scherbenokk if your server restarts are "clean" (Redis had time to shutdown, the process was not killed), then I suggest you switch to ioredis and you won't see those NOSCRIPT errors anymore.

SGrondin avatar Apr 19 '19 20:04 SGrondin

I'm getting the same issues. An internal retry mechanism seems to be the right solution. Relying on ioredis is problematic since ioredis is not part of the solution. Currently implementing an external retry mechanism, if NOSCRIPT error then reinitialize bottleneck, but it is a clunky solution.

bigman73 avatar Oct 29 '19 13:10 bigman73

For anyone else seeing this message - I've experienced it when defining a bottleneck instance re-using an existing Redis connection.

We had: const bottleneckRedisConnection = new Bottleneck.RedisConnection({ instead of const bottleneckRedisConnection = new Bottleneck.IORedisConnection({

imgalli avatar Jan 27 '21 13:01 imgalli