yii2-queue
yii2-queue copied to clipboard
"Has not waited the lock" on queue with supervisor numprocs more than one
How to solve? Or may be im doing something wrong?
if set numprocs=1
in supervisor config - no errors!
What steps will reproduce the problem?
config from common/config/main.php
'queue' => [
'class' => \yii\queue\db\Queue::class,
'as log' => \yii\queue\LogBehavior::class,
'db' => 'db', // DB connection component or its config
'tableName' => '{{%queue}}', // Table name
'channel' => 'default', // Queue channel key
'mutex' => \yii\mutex\PgsqlMutex::class, // Mutex that used to sync queries
'mutexTimeout' => 0,
'ttr' => 5 * 60, // Max time for anything job handling
'attempts' => 5, // Max number of attempts
],
supervisor config
[program:m-prod-yii-queue-worker]
command=/usr/bin/php /www/m/http/yii queue/listen --verbose=1 --color=0
autostart=true
autorestart=true
numprocs=2
process_name = %(program_name)s_%(process_num)02d
redirect_stderr=true
stdout_logfile=/www/m/log/yii-queue-worker.log
Error trace
yii\base\Exception: Has not waited the lock. in /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php:179
Stack trace:
#0 [internal function]: yii\queue\db\Queue->yii\queue\db\{closure}(Object(yii\db\Connection))
#1 /www/m/http/vendor/yiisoft/yii2/db/Connection.php(1059): call_user_func(Object(Closure), Object(yii\db\Connection))
#2 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php(211): yii\db\Connection->useMaster(Object(Closure))
#3 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php(78): yii\queue\db\Queue->reserve()
#4 [internal function]: yii\queue\db\Queue->yii\queue\db\{closure}(Object(Closure))
#5 /www/m/http/vendor/yiisoft/yii2-queue/src/cli/Queue.php(117): call_user_func(Object(Closure), Object(Closure))
#6 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php(93): yii\queue\cli\Queue->runWorker(Object(Closure))
#7 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Command.php(76): yii\queue\db\Queue->run(true, 3)
#8 [internal function]: yii\queue\db\Command->actionListen(3)
#9 /www/m/http/vendor/yiisoft/yii2/base/InlineAction.php(57): call_user_func_array(Array, Array)
#10 /www/m/http/vendor/yiisoft/yii2/base/Controller.php(157): yii\base\InlineAction->runWithParams(Array)
#11 /www/m/http/vendor/yiisoft/yii2/console/Controller.php(148): yii\base\Controller->runAction('listen', Array)
#12 /www/m/http/vendor/yiisoft/yii2/base/Module.php(528): yii\console\Controller->runAction('listen', Array)
#13 /www/m/http/vendor/yiisoft/yii2/console/Application.php(180): yii\base\Module->runAction('queue/listen', Array)
#14 /www/m/http/vendor/yiisoft/yii2/console/Application.php(147): yii\console\Application->runAction('queue/listen', Array)
#15 /www/m/http/vendor/yiisoft/yii2/base/Application.php(386): yii\console\Application->handleRequest(Object(yii\console\Request))
#16 /www/m/http/yii(27): yii\base\Application->run()
#17 {main}
Additional info
Q | A |
---|---|
Yii vesion | 2.0.16-dev |
PHP version | 7.1.10 |
Operating system | ubuntu16.04.1 |
Which PostgreSQL version?
psql (PostgreSQL) 10.3 (Ubuntu 10.3-1.pgdg16.04+1)
I have the same problem on psql (PostgreSQL) 10.4 (Ubuntu 10.4-0ubuntu0.18.04)
so it seems to be problem with PostgreSQL version. It works great on PostgreSQL 9.4.
the same story with MySQL
How large is your queue?
for me, it's 5 workers
I mean how many jobs do you have in queue table.
right now I have a very little load. May be 1 job per minute
I have ~300k waiting jobs in queue and DB driver (MariaDB) becomes unusable - it took like 2-3 seconds to reserve a job (which executes in less than 0.1 second, so queue spends most of its time on reserving jobs).
You are getting lock contention, meaning that all workers are trying to obtain a lock at the same time. For those kinds of loads it makes more sense to use a real queue like beanstalkd. (Note that is is really easy 0 configuration to setup, and it will make your life easier and your queue faster)
the same story with MySQL
+1
I have the same problem in a Postgresql database "PostgreSQL 10.4 (Debian 10.4-2.pgdg90 + 1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18 + deb9u1) 6.3.0 20170516, 64-bit "
it seems a subject of the database configuration.
Is there any idea how to solve this? What effects can this have? Does this error leave unresolved tasks?
+1 Why does the error 'Has not waited the lock' occur? What is the cause?
OP problem is related to mutex settings, see this answer on SO.
In general if you get such errors, you should switch mutex backend to some more reliable implementation (MysqlMutex
works fine for me), and/or increase mutexTimeout
(setting it to 0 will throw this exception on every concurrency issue, which is very impractical for real queue).
Doesn't seem there's anything to fix in the code but definitely worth documenting it.
We too have been bitten by the Has not waited the lock exception. Our queue is currently configured as follows:
'queue' => [
'class' => \yii\queue\db\Queue::class,
'db' => 'queue-db',
'tableName' => '{{%queue}}',
'channel' => 'default',
'attempts' => 3, // Max number of attempts
'ttr' => 60, // Maximum duration a job should run before it is considered abandoned
'mutex' => \yii\mutex\MysqlMutex::class,
'as jobMonitor' => \zhuravljov\yii\queue\monitor\JobMonitor::class,
'as workerMonitor' => \zhuravljov\yii\queue\monitor\WorkerMonitor::class,
]
We are processing ~15,000 - 20,000 jobs an hour with 8 concurrent workers. We see when DB load gets high the lock takes longer and times out. Our current thinking is to move the mutex to something other than MySQL (redis in our case) so that high db load does not impact the worker's ability to take/close locks. In my evaluation of the code it seems fine to have a back-end different than the mutex provider (e.g. keep mysql back-end, but move the mutex to redis). Just wanted to ping the community to get your thoughts on this approach - any red flags?
@darrylkuhn It will probably not change anything, because it is highly unlikely that mutex is the bottleneck here. The problem is in the process, which holds lock. DB driver has know performance issues - in big queues reserving job may take a while. If you want to change something, I would rather replace DB queue driver with something else.
Use a better driver like beanstalk for job queues. The database is a bad place for a job queue.
@rob006 and @SamMousa understood we'll probably end up taking that approach. Thanks for the feedback.
Fix it by change two methods - https://github.com/yiisoft/yii2-queue/pull/362/files
Fix it by change two methods - https://github.com/yiisoft/yii2-queue/pull/362/files
Absolutely confirmed fix for my MariaDB instance.
We had this problem 8 months ago and moved to Amazon SQS however the lack of visibility and the inability to delay more than 15 minutes has us looking for another back end again. Can anybody speak to whether or not the lock issue here is present with the redis driver?
Have you tried beanstalkd?
@SamMousa No we haven't. We have redis infrastructure already and was hoping to leverage that however we're processing ~20,000 jobs an hour; given that workload am I correct in assuming that redis is also not an appropriate back-end? Thanks.
I don't think it's that much to be honest.. but redis is not a job queue. If queue jobs are ephemeral beanstalkd is very simple to set to though.
@darrylkuhn Redis does not have this performance problem (I'm using it to handle hundreds of thousands jobs per hour), as long as you configured your mutex correctly (mutexTimeout
is not 0). But it has some other limitations, like lack of priorities, and it is not really atomic, so in unstable environment you may end up with inconsistent queue.
@SamMousa and @rob006 thanks both for the input - think I'll probably suck it up and spin up beanstalkd 👍
In my case, the problem was the absence of an index on the done_at
field.
UPDATE queue
SET reserved_at
=null WHERE reserved_at
< :time - ttr
and done_at
is null;
This query initiated by moveExpired() method and works slowly, if deleteReleased=false, mutex expired, worker crashed with exception, restarted by supervisor and crash again.
After create done_at
index all works fine.
I think, need update documentation for this index https://github.com/yiisoft/yii2-queue/blob/master/docs/guide/driver-db.md
In my case, the problem was the absence of an index on the
done_at
field.UPDATE
queue
SETreserved_at
=null WHEREreserved_at
< :time -ttr
anddone_at
is null; This query initiated by moveExpired() method and works slowly, if deleteReleased=false, mutex expired, worker crashed with exception, restarted by supervisor and crash again.After create
done_at
index all works fine. I think, need update documentation for this index https://github.com/yiisoft/yii2-queue/blob/master/docs/guide/driver-db.md
i think , this query need to add channel as filter since this update query takes 8 second in my case with 500k queue
I think if you have 500k jobs then db driver is not the best choice (especially if you use several channels), use a specialized messaging service instead. The db driver is meant to have just a small number of concurrent jobs where index is irrelevant on select and inserting/updating jobs should be as fast as possible.