SafeQueue Increase resilience of worker by breaking out the "fetch job" vs "do job" parts

Increase resilience of worker by breaking out the "fetch job" vs "do job" parts

Open cooperaj opened this issue 6 years ago • 1 comments

In a situation where the worker fails to be able to communicate with it's job provider i.e. Redis it will fail with an exception. The only way to get it to resolve a new Redis server (assuming some sort of HA setup) is to restart the worker - this is because PHP, rather helpfully, caches name lookups.

By breaking out fetching of jobs from doing of jobs you're able to catch that possible case and cause the worker to exit (and be restarted by whatever scheduling tool you're using). In our situation this results in a new resolution to the revived/hot spare Redis instance and a working queue.

Oct 18 '18 15:10 cooperaj

Coverage increased (+11.3%) to 100.0% when pulling cc6db26f8d0211e3a5ec6e2ec147525e074ec81c on UniversityOfNottingham:feature/0.2-make-worker-resiliant into ed2cbf947961a3c6bb9b474865d12b9de8fe2141 on maxbrokman:0.2.

Oct 19 '18 14:10 coveralls

SafeQueue SafeQueue copied to clipboard

Increase resilience of worker by breaking out the "fetch job" vs "do job" parts

SafeQueue
SafeQueue copied to clipboard