sidekiq_alive
sidekiq_alive copied to clipboard
Worker goes unhealthy after 10 minutes on sidekiq 7
We are trying to upgrade our infrastructure to use sidekiq 7 and facing the following issue: sidekiq_alive works fine when registering the worker in redis, then it becomes healthy. After exactly 10 minutes it becomes unhealthy with this message: "Can't find the alive key" and pod gets restarted. We have verified and during this 10 minutes period we can see it really alive, so it's not like our health check starts working after 10 minutes. Any idea where to look to solve this problem?
First, there are no guarantees that the jobs will be executed in order ever. If that's what you need, you must create the next job at the end of executing the previous one.
That is not something that I can ever guarantee. I'll read the rest of it later; that was just the first thing that struck me.
@mhenrixon What is the reason for Limit exceeded in uniquejobs:changelog?
We are trying to update Sidekiq 5.1.3 / sidekiq-unique 5.0.10 to a recent version, but in production our queues were not behaving as expected. In the Redis logs, we saw many "Limit exceeded" messages being added to uniquejobs:changelog, but we are not sure what that means.
Having this issue quite often on "long" running jobs (couple of minutes).
Also curious as to what Limit exceeded means exactly ?
I infer that Limit exceeded means that a job was not added to the queue because it still has a uniquejobs lock, but I might be wrong about that. I'm getting that every time I try to add a job via sidekiq-cron, it ends up not getting enqueued because of Limit exceeded, even though there does not appear to be a corresponding lock (and no job is already running).
@nathan-appere @ragesoss I should likely replace that message with a better one.
You can allow any number of simultaneous locks. This message is likely confusing for people who don't use lock_limit: on their jobs.
I'll see about making this less confusing.
@mhenrixon Hi there, i just started using this gem, and im also confused by this message. Im using :replace strategy and i dont understand why i get limited_exceeded and it seems to be retried given that theres a previous job_id. Also lock_limit is not documented on the README, i believe. Thanks