resque-scheduler
resque-scheduler copied to clipboard
Partial schedule load on restart
Hi there,
I'm trying to troubleshoot a strange issue we had the other day where it looks like we had a partial schedule load following a restart under 2.5.5.
At 20:29, we did a release and I can see the following in the logs:
resque-scheduler: [INFO] 2014-08-07T20:29:41+00:00: Shutting down
Thu Aug 7 20:29:46 UTC 2014: Starting resque-scheduler...
resque-scheduler: [INFO] 2014-08-07T20:30:27+00:00: Loading Schedule
resque-scheduler: [INFO] 2014-08-07T20:30:27+00:00: Scheduling refresh_pricing_samples
resque-scheduler: [INFO] 2014-08-07T20:30:27+00:00: Scheduling populate_consigner_stats
resque-scheduler: [INFO] 2014-08-07T20:30:27+00:00: Scheduling task_reminder
resque-scheduler: [INFO] 2014-08-07T20:30:27+00:00: Scheduling update_yesmail_for_changed_first_look_subscriptions
resque-scheduler: [INFO] 2014-08-07T20:30:27+00:00: Scheduling refresh_calendar_events
resque-scheduler: [INFO] 2014-08-07T20:30:27+00:00: Scheduling process_apn_feedback
resque-scheduler: [INFO] 2014-08-07T20:30:27+00:00: Scheduling populate_customer_stats
resque-scheduler: [INFO] 2014-08-07T20:30:27+00:00: Schedules Loaded
We didn't notice that the full schedules weren't loaded until a few hours later at which point we restarted :
resque-scheduler: [INFO] 2014-08-07T23:51:01+00:00: Shutting down
Thu Aug 7 23:51:06 UTC 2014: Starting resque-scheduler...
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Loading Schedule
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling hourly_push_notifications
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling first_look_trial_assigner
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling refresh_calendar_events
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling store_credit_expiration_notifier
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling bill_first_look_subscriptions
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling process_apn_feedback
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling cart_expiration
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling shipped_shipments
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling poll_expiring_subscriptions
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling clear_sessions
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling go_data_feed_upload
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling wait_list_check
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling expired_contracts
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling task_reminder
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling populate_consigner_stats
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling update_yesmail_for_changed_first_look_subscriptions
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling populate_customer_stats
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling refresh_pricing_samples
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Scheduling queue_payments
resque-scheduler: [INFO] 2014-08-07T23:51:46+00:00: Schedules Loaded
I saw some issues with dynamic loading ( #369 ) and was wondering if this could be the cause.
If it helps, here's our schedule:
cart_expiration: every: 1m
clear_sessions: cron: "30 * * * *"
queue_payments: every: 5m
expired_contracts: every: 24h
first_look_trial_assigner: cron: "0 0 * * * America/Los_Angeles"
go_data_feed_upload: every: 6h
shipped_shipments: every: 15m
wait_list_check: cron: "0 * * * * America/Los_Angeles"
store_credit_expiration_notifier: cron: "0 6 * * * America/Los_Angeles"
poll_expiring_subscriptions: cron: "0 2 * * * America/Los_Angeles"
bill_first_look_subscriptions: cron: "0 2 * * * America/Los_Angeles"
hourly_push_notifications: cron: "0 * * * * America/Los_Angeles"
process_apn_feedback: cron: "0 4 * * * America/Los_Angeles"
refresh_calendar_events: every: 10m
refresh_pricing_samples: cron: "0 3 * * * America/Los_Angeles"
task_reminder: every: 10m
populate_consigner_stats: cron: "0 2 * * * America/Los_Angeles"
update_yesmail_for_changed_first_look_subscriptions: cron: "0 2 * * * America/Los_Angeles"
populate_customer_stats: cron: "0 3 * * * America/Los_Angeles"
Thanks.
I have been having the same issue since upgrading from 2.0.0 to 3.0.0
When only the scheduler is restarted the correct set of schedules is loaded. But when restarting the scheduler and workers an incorrect set of schedules would sometimes be loaded. I am using Upstart to restart the processes.
I monkey patched Resque::Scheduler::SchedulerExtensions#schedule=
to log the size of the schedules at the end of the method.
Then I tailed my scheduler and worker logs.
Schedule Size 171 # scheduler
Schedule Size 207 # worker
Schedule Size 207 # worker
Schedule Size 191 # worker
Schedule Size 156 # worker
Schedule Size 154 # worker
Schedule Size 115 # worker
Schedule Size 148 # worker
Schedule Size 207 # worker
207 is the correct size.
In resque:setup
is
Resque.schedule = ...
This is called by the workers and the scheduler. But only the scheduler needs to have the schedule set.
Possible workaround
Ensure that Resque.schedule = ...
is not anywhere else in your application.
Change the resque:setup
task so only the scheduler loads the schedule
Resque.schedule = ... unless ENV['QUEUE']
I think what is happening is clean_schedules
in one process is called while reload_schedules!
in another process is running. Possibly?
https://github.com/resque/resque-scheduler/blob/master/lib/resque/scheduler/scheduling_extensions.rb#L85-L93
Hrm...that's interesting. We'll try that out.
Thanks for the suggestion.
I created two tests demonstrating what is happening. One process is setting schedules while another is clearing schedules.
Pipelineing the entire clear_schedules process is easy, but it is not so easy to pipeline the set_schedule process.
Can someone confirm that #439 fixes this issue? If so, I'd love to see this issue closed, I'd also love to see a new release. We just ran into this bug and it's a huge problem, imo as it breaks the core functionality of resque-scheduler, since some jobs are randomly not scheduled.
@dougcole can you confirm that the patch fixes the issue, since you are affected? I merged it because it was demonstrably better than what we had before, but if you have a real live-action reproduction, I would appreciate you chiming in with whether or not this fixed your issue.
@yaauie Sure, I'll ship the github version of resque-scheduler and report back in a week. The dropping of jobs is intermittent enough that that seems like the minimum amount of time necessary to be sure it's really fixed.
(Just in case others have run into this issue)
We're using the code below until this patch is confirmed working (and a new resque-scheduler build is pushed).
Rake::Task["resque:scheduler_setup"].enhance do
Resque.schedule = YAML.env_config('cron')
end
It looks a lot like this bug is fixed in master, I'd say close this issue and I'll open a new one if the problem shows up again.
When can we expect new version of Gem with this pull request merged?