resque-scheduler
resque-scheduler copied to clipboard
Redis 5 API for multi/pipelined commands changes transaction behavior in resque-scheduler 4.9+
When upgrading to resque-scheduler 4.9.0 and above with redis-rb 4.x, we still get the deprecation notices related to https://github.com/resque/resque/issues/1794 and https://github.com/resque/resque-scheduler/issues/745.
Repro:
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "resque", "~> 2.6"
gem "resque-scheduler", "~> 4.9"
gem "redis", "~> 4.0"
end
Resque.redis = Redis.new # NOTE: need a fresh redis-server running locally
class Job
@queue = "jobs"
def self.perform
puts "performed job"
end
end
Resque.enqueue_in(1, Job)
sleep(1)
Resque::Scheduler.handle_delayed_items
puts "enqueued delayed items"
Resque::Worker.new("jobs").work(0)
$ ruby repro.rb
resque-scheduler: [INFO] 2023-12-27T11:11:35-08:00: Processing Delayed Items
Redis#srem will always return an Integer in Redis 5.0.0. Use Redis#srem? instead.(called from: /Users/alex.vondrak/.local/share/rtx/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/redis-namespace-1.11.0/lib/redis/namespace.rb:564:in `wrapped_send')
Pipelining commands on a Redis instance is deprecated and will be removed in Redis 5.0.0.
redis.multi do
redis.get("key")
end
should be replaced by
redis.multi do |pipeline|
pipeline.get("key")
end
(called from /Users/alex.vondrak/.local/share/rtx/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/redis-namespace-1.11.0/lib/redis/namespace.rb:564:in `wrapped_send'}
enqueued delayed items
performed job
After enough poking around through stack traces, I believe I've discovered the issue, and it's a subtle interaction between the resque & resque-scheduler gems:
- resque-scheduler calls
Resque.redis.multi
here: https://github.com/resque/resque-scheduler/blob/462e33e48a799f3bd89c305e1f6fe24996810bb5/lib/resque/scheduler.rb#L253-L259 - through a chain of calls, this hits
Resque::Job.create
: https://github.com/resque/resque-scheduler/blob/462e33e48a799f3bd89c305e1f6fe24996810bb5/lib/resque/scheduler.rb#L327 - resque defines
Job.create
in such a way that we effectively make a call toResque.redis.pipelined
: https://github.com/resque/resque/blob/2f9d080ce86eb2e3f1f3d47599a21c576124c6f3/lib/resque/data_store.rb#L105-L108
Copying the shape of the above into a direct repro, we're basically doing
# Causes deprecation warning
Resque.redis.multi do |m|
Resque.redis.pipelined do |p|
# ...
end
end
instead of what the redis gem wants, which is
# Does not cause deprecation warning
Resque.redis.multi do |m|
m.pipelined do |p|
# ...
end
end
It might be hard to actually achieve the latter, though, since the code is straddling two gems. 😕
We were not getting this warning prior to resque-scheduler 4.9 since the outer multi
was added by https://github.com/resque/resque-scheduler/pull/767 (which also causes other nested transaction issues, like in https://github.com/resque/resque-scheduler/issues/773).
It seems like this still works circa redis 5.x. At least when switching the version in the above repro, I get:
$ ruby repro.rb
resque-scheduler: [INFO] 2023-12-27T11:14:06-08:00: Processing Delayed Items
enqueued delayed items
performed job
I think the deprecation may just kind of be firing in a wonky way due to how redis 4.x tries to juggle state.
Circa 4.x:
-
Redis#multi
replaces@client
with aDeprecatedMulti
: https://github.com/redis/redis-rb/blob/0afcf1cad6f4fc74213f2b4c7a20819581c36897/lib/redis.rb#L225 -
Redis#pipelined
replaces@client
with aDeprecatedPipeline
: https://github.com/redis/redis-rb/blob/0afcf1cad6f4fc74213f2b4c7a20819581c36897/lib/redis.rb#L174 - Both
DeprecatedMulti
andDeprecatedPipeline
are defined such that any call to the underlyingPipeline::Multi
orPipeline
methods will issue deprecation warnings: https://github.com/redis/redis-rb/blob/0afcf1cad6f4fc74213f2b4c7a20819581c36897/lib/redis/pipeline.rb#L208-L236 - thus, because the nested
pipelined
call happens onResque.redis
, it winds up delegating to the@client
, which issues the deprecation warning
Circa 5.x:
-
Redis#multi
will just yield a newMultiConnection
instance: https://github.com/redis/redis-rb/blob/667b0007191cb5c9ed6f3b1ea7fc97a40f8f8c93/lib/redis/commands/transactions.rb#L23-L29 -
Redis#pipelined
will just yield a newPipelinedConnection
instance: https://github.com/redis/redis-rb/blob/935f64f18d7c37386e9b01db71683f8f3b868cd2/lib/redis.rb#L102-L108 - those objects handle nested calls to
multi
/pipelined
in their respective ways: https://github.com/redis/redis-rb/blob/667b0007191cb5c9ed6f3b1ea7fc97a40f8f8c93/lib/redis/pipeline.rb#L16-L31 + https://github.com/redis/redis-rb/blob/667b0007191cb5c9ed6f3b1ea7fc97a40f8f8c93/lib/redis/pipeline.rb#L59-L61
~However, it's unclear to me what the interaction will be between two "top level" calls to these transaction commands. I.e., will the futures fire with the desired timing, or do they go out of sync in some way that may cause an issue?~ EDIT: per investigation below, it seems that the "nested" pipeline actually silently fires outside of the new MULTI
/EXEC
transaction around the batch; see https://github.com/resque/resque-scheduler/issues/787#issuecomment-1870717786 for details.
However, it's unclear to me what the interaction will be between two "top level" calls to these transaction commands. I.e., will the futures fire with the desired timing, or do they go out of sync in some way that may cause an issue?
I think I figured this part out by adding some Redis logging to the repro script:
repro.rb
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "resque", "~> 2.6"
gem "resque-scheduler", "~> 4.9"
gem "redis", "~> #{ENV["REDIS_VERSION"]}.0"
end
case ENV["REDIS_VERSION"]
when "4"
$stdout.sync = true
logger = Logger.new($stdout, level: Logger::DEBUG)
Resque.redis = Redis.new(logger: logger)
when "5"
RedisClient.register(Module.new do
def call(command, ...)
puts "[call] #{command.flatten.join(" ")}"
super
end
def call_pipelined(commands, ...)
puts "[call_pipelined] #{commands.flatten.join(" ")}"
super
end
end)
Resque.redis = Redis.new
end
class Job
@queue = "jobs"
end
Resque.redis.redis.flushdb # to ensure we start with a blank slate
Resque.enqueue_in(1, Job)
sleep(1)
Resque::Scheduler.handle_delayed_items
Redis 4
$ REDIS_VERSION=4 ruby repro.rb
D, [2023-12-27T16:17:24.233121 #49036] DEBUG -- : [Redis] command=FLUSHDB args=
D, [2023-12-27T16:17:24.234243 #49036] DEBUG -- : [Redis] call_time=0.80 ms
D, [2023-12-27T16:17:24.234376 #49036] DEBUG -- : [Redis] command=RPUSH args="resque:delayed:1703722645" "{\"class\":\"Job\",\"args\":[],\"queue\":\"jobs\"}"
D, [2023-12-27T16:17:24.234501 #49036] DEBUG -- : [Redis] call_time=0.11 ms
D, [2023-12-27T16:17:24.234597 #49036] DEBUG -- : [Redis] command=SADD args="resque:timestamps:{\"class\":\"Job\",\"args\":[],\"queue\":\"jobs\"}" ["delayed:1703722645"]
D, [2023-12-27T16:17:24.234710 #49036] DEBUG -- : [Redis] call_time=0.10 ms
D, [2023-12-27T16:17:24.234730 #49036] DEBUG -- : [Redis] command=ZADD args="resque:delayed_queue_schedule" 1703722645 1703722645
D, [2023-12-27T16:17:24.234850 #49036] DEBUG -- : [Redis] call_time=0.11 ms
D, [2023-12-27T16:17:25.238398 #49036] DEBUG -- : [Redis] command=ZRANGEBYSCORE args="resque:delayed_queue_schedule" "-inf" 1703722645 "LIMIT" 0 1
D, [2023-12-27T16:17:25.241675 #49036] DEBUG -- : [Redis] call_time=2.68 ms
resque-scheduler: [INFO] 2023-12-27T16:17:25-08:00: Processing Delayed Items
D, [2023-12-27T16:17:25.245827 #49036] DEBUG -- : [Redis] command=INFO args=
D, [2023-12-27T16:17:25.246626 #49036] DEBUG -- : [Redis] call_time=0.69 ms
D, [2023-12-27T16:17:25.255685 #49036] DEBUG -- : [Redis] command=SCRIPT args="load" "if redis.call('SETNX', KEYS[1], ARGV[1]) == 1\nthen\n redis.call('EXPIRE', KEYS[1], 180)\n return 1\nelse\n return 0\nend\n"
D, [2023-12-27T16:17:25.256571 #49036] DEBUG -- : [Redis] call_time=0.74 ms
D, [2023-12-27T16:17:25.256740 #49036] DEBUG -- : [Redis] command=EVALSHA args="8f4a4e422bfa8f9a0931e350d6e30b1c3ce97c33" 1 "resque:resque:resque_scheduler_master_lock" "localhost:49036"
D, [2023-12-27T16:17:25.257156 #49036] DEBUG -- : [Redis] call_time=0.38 ms
D, [2023-12-27T16:17:25.257338 #49036] DEBUG -- : [Redis] command=LRANGE args="resque:delayed:1703722645" 0 99
D, [2023-12-27T16:17:25.257677 #49036] DEBUG -- : [Redis] call_time=0.30 ms
D, [2023-12-27T16:17:25.257812 #49036] DEBUG -- : [Redis] command=WATCH args="resque:delayed:1703722645"
D, [2023-12-27T16:17:25.258041 #49036] DEBUG -- : [Redis] call_time=0.20 ms
Redis#srem will always return an Integer in Redis 5.0.0. Use Redis#srem? instead.(called from: /Users/alex.vondrak/.local/share/rtx/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/redis-namespace-1.11.0/lib/redis/namespace.rb:564:in `wrapped_send')
Pipelining commands on a Redis instance is deprecated and will be removed in Redis 5.0.0.
redis.multi do
redis.get("key")
end
should be replaced by
redis.multi do |pipeline|
pipeline.get("key")
end
(called from /Users/alex.vondrak/.local/share/rtx/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/redis-namespace-1.11.0/lib/redis/namespace.rb:564:in `wrapped_send'}
D, [2023-12-27T16:17:25.262883 #49036] DEBUG -- : [Redis] command=MULTI args=
D, [2023-12-27T16:17:25.262965 #49036] DEBUG -- : [Redis] command=SREM args="resque:timestamps:{\"class\":\"Job\",\"args\":[],\"queue\":\"jobs\"}" "delayed:1703722645"
D, [2023-12-27T16:17:25.263009 #49036] DEBUG -- : [Redis] command=SADD args="resque:queues" ["jobs"]
D, [2023-12-27T16:17:25.263042 #49036] DEBUG -- : [Redis] command=RPUSH args="resque:queue:jobs" "{\"class\":\"Job\",\"args\":[]}"
D, [2023-12-27T16:17:25.263076 #49036] DEBUG -- : [Redis] command=LTRIM args="resque:delayed:1703722645" 100 -1
D, [2023-12-27T16:17:25.263101 #49036] DEBUG -- : [Redis] command=EXEC args=
D, [2023-12-27T16:17:25.263974 #49036] DEBUG -- : [Redis] call_time=0.82 ms
D, [2023-12-27T16:17:25.264210 #49036] DEBUG -- : [Redis] command=WATCH args="resque:delayed:1703722645"
D, [2023-12-27T16:17:25.264548 #49036] DEBUG -- : [Redis] call_time=0.30 ms
D, [2023-12-27T16:17:25.264617 #49036] DEBUG -- : [Redis] command=LLEN args="resque:delayed:1703722645"
D, [2023-12-27T16:17:25.264860 #49036] DEBUG -- : [Redis] call_time=0.22 ms
D, [2023-12-27T16:17:25.264995 #49036] DEBUG -- : [Redis] command=MULTI args=
D, [2023-12-27T16:17:25.265023 #49036] DEBUG -- : [Redis] command=DEL args="resque:delayed:1703722645"
D, [2023-12-27T16:17:25.265051 #49036] DEBUG -- : [Redis] command=ZREM args="resque:delayed_queue_schedule" 1703722645
D, [2023-12-27T16:17:25.265073 #49036] DEBUG -- : [Redis] command=EXEC args=
D, [2023-12-27T16:17:25.265405 #49036] DEBUG -- : [Redis] call_time=0.30 ms
D, [2023-12-27T16:17:25.265527 #49036] DEBUG -- : [Redis] command=ZRANGEBYSCORE args="resque:delayed_queue_schedule" "-inf" 1703722645 "LIMIT" 0 1
D, [2023-12-27T16:17:25.265754 #49036] DEBUG -- : [Redis] call_time=0.20 ms
Redis 5
$ REDIS_VERSION=5 ruby repro.rb
[call] flushdb
[call] rpush resque:delayed:1703722657 {"class":"Job","args":[],"queue":"jobs"}
[call] sadd resque:timestamps:{"class":"Job","args":[],"queue":"jobs"} delayed:1703722657
[call] zadd resque:delayed_queue_schedule 1703722657 1703722657
[call] zrangebyscore resque:delayed_queue_schedule -inf 1703722657 LIMIT 0 1
resque-scheduler: [INFO] 2023-12-27T16:17:37-08:00: Processing Delayed Items
[call] info
[call] script load if redis.call('SETNX', KEYS[1], ARGV[1]) == 1
then
redis.call('EXPIRE', KEYS[1], 180)
return 1
else
return 0
end
[call] evalsha 8f4a4e422bfa8f9a0931e350d6e30b1c3ce97c33 1 resque:resque:resque_scheduler_master_lock localhost:49123
[call] lrange resque:delayed:1703722657 0 99
[call] watch resque:delayed:1703722657
[call_pipelined] sadd resque:queues jobs rpush resque:queue:jobs {"class":"Job","args":[]}
[call_pipelined] MULTI srem resque:timestamps:{"class":"Job","args":[],"queue":"jobs"} delayed:1703722657 ltrim resque:delayed:1703722657 100 -1 EXEC
[call] watch resque:delayed:1703722657
[call] llen resque:delayed:1703722657
[call_pipelined] MULTI del resque:delayed:1703722657 zrem resque:delayed_queue_schedule 1703722657 EXEC
[call] zrangebyscore resque:delayed_queue_schedule -inf 1703722657 LIMIT 0 1
Annoyingly, the log formats aren't the same between versions. But the salient point is that the Redis commands are actually executed a bit differently in Redis 4 vs Redis 5.
With Redis 4, the enqueue in the nested pipeline (from within the resque gem) gets added to the MULTI
/EXEC
already in progress that resque-scheduler is wrapping around each batch. Thus, the queue push happens inside of the transaction:
MULTI
SREM resque:timestamps:{"class":"Job","args":[],"queue":"jobs"} delayed:1703722645
SADD resque:queues jobs
RPUSH resque:queue:jobs {"class":"Job","args":[]}
LTRIM resque:delayed:1703722645 100 -1
EXEC
With Redis 5, the nested pipeline appears to fire eagerly, since it's effectively being called on the top-level like Resque.redis.pipelined { ... }
instead of on the block argument like Resque.redis.multi { |m| m.pipelined { ... } }
. Thus, the queue push happens outside of the transaction:
SADD resque:queues jobs
RPUSH resque:queue:jobs {"class":"Job","args":[]}
MULTI
SREM resque:timestamps:{"class":"Job","args":[],"queue":"jobs"} delayed:1703722657
LTRIM resque:delayed:1703722657 100 -1
EXEC
Note that the SADD
+ RPUSH
are still sent as one unit in a pipelined command, but they're still apparently sent before the MULTI
/EXEC
block, meaning that the enqueue could still go through even if the schedule bookkeeping gets rolled back.
I'm guessing that this is not the desired behavior. I'm not totally sure of the implications, but it seems fairly serious. As such, I'll amend the title to reflect the broader nature of the issue.
Is there any update on how to fix this issue? I've found that setting Redis.silence_deprecations = true
does not resolve it.
PS - using the following:
- redis (4.8.1)
- resque-scheduler (4.10.2)
- resque (2.6.0)
We've been observing job duplication on our app (with redis-rb 5.x), very probably related to this issue. As mentioned by @ajvondrak this seems like a serious problem indeed.
Fixing this issue would require patching both resque and resque-scheduler to pass down the transaction object, so quite invasive and complex... This would not solve issues for enqueue hooks that use redis though. Perhaps a revert of https://github.com/resque/resque-scheduler/pull/767 should be considered?