solid_queue
solid_queue copied to clipboard
Take an exclusive lock on semaphores upfront before attempting to decrement
This tries to address a deadlock we've seen about once every couple of days, where two jobs that compete for the semaphore are enqueued at the same time. Two transactions acquire a shared lock over a row in the semaphores table by key and then they try to upgrade it to an exclusive lock, leading to a deadlock.
From SHOW ENGINE INNODB STATUS:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-12-27 07:57:28 140410341029440
*** (1) TRANSACTION:
TRANSACTION 1972990032, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 3012240, OS thread handle 140409154041408, query id 7398762432 bigip-vip-new.rw-ash-int.37signals.com 10.20.0.24 haystack_app updating
UPDATE `solid_queue_semaphores` SET value = value - 1, expires_at = '2023-12-27 08:12:28.002702' WHERE (value > 0) AND `solid_queue_semaphores`.`key` = 'RR::ProcessJob/C/64961261'
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990032 lock mode S
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
1: len 8; hex 80000000004224c4; asc B$ ;;
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990032 lock_mode X locks rec but not gap waiting
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
1: len 8; hex 80000000004224c4; asc B$ ;;
*** (2) TRANSACTION:
TRANSACTION 1972990013, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 3012575, OS thread handle 140275687212608, query id 7398762530 bigip-vip.sc-chi-int.37signals.com 10.10.0.37 haystack_app updating
UPDATE `solid_queue_semaphores` SET value = value - 1, expires_at = '2023-12-27 08:12:28.007153' WHERE (value > 0) AND `solid_queue_semaphores`.`key` = 'RR::ProcessJob/C/64961261'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990013 lock mode S
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
1: len 8; hex 80000000004224c4; asc B$ ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990013 lock_mode X locks rec but not gap waiting
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
1: len 8; hex 80000000004224c4; asc B$ ;;
*** WE ROLL BACK TRANSACTION (2)
With this change, instead of running:
UPDATE `solid_queue_semaphores` SET value = value - 1, expires_at = '2023-12-27 08:12:28.007153' WHERE (value > 0) AND `solid_queue_semaphores`.`key` = 'RR::ProcessJob/C/64961261'
We'll do
BEGIN
SELECT `solid_queue_semaphores`.`id` FROM `solid_queue_semaphores` WHERE (value > 0) AND `solid_queue_semaphores`.`key` = 'SequentialUpdateResultJob/JobResult/3' FOR UPDATE
UPDATE `solid_queue_semaphores` SET value = value - 1, expires_at = '2023-12-28 21:13:45.303352' WHERE (value > 0) AND `solid_queue_semaphores`.`key` = 'SequentialUpdateResultJob/JobResult/3'
COMMIT