SafeQueue
SafeQueue copied to clipboard
Increase resilience of worker by breaking out the "fetch job" vs "do job" parts
In a situation where the worker fails to be able to communicate with it's job provider i.e. Redis it will fail with an exception. The only way to get it to resolve a new Redis server (assuming some sort of HA setup) is to restart the worker - this is because PHP, rather helpfully, caches name lookups.
By breaking out fetching of jobs from doing of jobs you're able to catch that possible case and cause the worker to exit (and be restarted by whatever scheduling tool you're using). In our situation this results in a new resolution to the revived/hot spare Redis instance and a working queue.