yii2-queue icon indicating copy to clipboard operation
yii2-queue copied to clipboard

"Has not waited the lock" on queue with supervisor numprocs more than one

Open m1roff opened this issue 6 years ago • 41 comments

How to solve? Or may be im doing something wrong?

if set numprocs=1 in supervisor config - no errors!

What steps will reproduce the problem?

config from common/config/main.php

'queue' => [
            'class' => \yii\queue\db\Queue::class,
            'as log' => \yii\queue\LogBehavior::class,
            'db' => 'db', // DB connection component or its config
            'tableName' => '{{%queue}}', // Table name
            'channel' => 'default', // Queue channel key
            'mutex' => \yii\mutex\PgsqlMutex::class, // Mutex that used to sync queries
            'mutexTimeout' => 0,
            'ttr' => 5 * 60, // Max time for anything job handling
            'attempts' => 5, // Max number of attempts
        ],

supervisor config

[program:m-prod-yii-queue-worker]
command=/usr/bin/php /www/m/http/yii queue/listen --verbose=1 --color=0
autostart=true
autorestart=true
numprocs=2
process_name = %(program_name)s_%(process_num)02d
redirect_stderr=true
stdout_logfile=/www/m/log/yii-queue-worker.log

Error trace

yii\base\Exception: Has not waited the lock. in /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php:179
Stack trace:
#0 [internal function]: yii\queue\db\Queue->yii\queue\db\{closure}(Object(yii\db\Connection))
#1 /www/m/http/vendor/yiisoft/yii2/db/Connection.php(1059): call_user_func(Object(Closure), Object(yii\db\Connection))
#2 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php(211): yii\db\Connection->useMaster(Object(Closure))
#3 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php(78): yii\queue\db\Queue->reserve()
#4 [internal function]: yii\queue\db\Queue->yii\queue\db\{closure}(Object(Closure))
#5 /www/m/http/vendor/yiisoft/yii2-queue/src/cli/Queue.php(117): call_user_func(Object(Closure), Object(Closure))
#6 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Queue.php(93): yii\queue\cli\Queue->runWorker(Object(Closure))
#7 /www/m/http/vendor/yiisoft/yii2-queue/src/drivers/db/Command.php(76): yii\queue\db\Queue->run(true, 3)
#8 [internal function]: yii\queue\db\Command->actionListen(3)
#9 /www/m/http/vendor/yiisoft/yii2/base/InlineAction.php(57): call_user_func_array(Array, Array)
#10 /www/m/http/vendor/yiisoft/yii2/base/Controller.php(157): yii\base\InlineAction->runWithParams(Array)
#11 /www/m/http/vendor/yiisoft/yii2/console/Controller.php(148): yii\base\Controller->runAction('listen', Array)
#12 /www/m/http/vendor/yiisoft/yii2/base/Module.php(528): yii\console\Controller->runAction('listen', Array)
#13 /www/m/http/vendor/yiisoft/yii2/console/Application.php(180): yii\base\Module->runAction('queue/listen', Array)
#14 /www/m/http/vendor/yiisoft/yii2/console/Application.php(147): yii\console\Application->runAction('queue/listen', Array)
#15 /www/m/http/vendor/yiisoft/yii2/base/Application.php(386): yii\console\Application->handleRequest(Object(yii\console\Request))
#16 /www/m/http/yii(27): yii\base\Application->run()
#17 {main}

Additional info

Q A
Yii vesion 2.0.16-dev
PHP version 7.1.10
Operating system ubuntu16.04.1

m1roff avatar May 08 '18 12:05 m1roff

Which PostgreSQL version?

kaspirovski avatar Jul 02 '18 12:07 kaspirovski

psql (PostgreSQL) 10.3 (Ubuntu 10.3-1.pgdg16.04+1)

m1roff avatar Jul 02 '18 12:07 m1roff

I have the same problem on psql (PostgreSQL) 10.4 (Ubuntu 10.4-0ubuntu0.18.04) so it seems to be problem with PostgreSQL version. It works great on PostgreSQL 9.4.

kaspirovski avatar Jul 04 '18 07:07 kaspirovski

the same story with MySQL

akorz avatar Jul 06 '18 09:07 akorz

How large is your queue?

rob006 avatar Jul 06 '18 09:07 rob006

for me, it's 5 workers

akorz avatar Jul 06 '18 09:07 akorz

I mean how many jobs do you have in queue table.

rob006 avatar Jul 06 '18 09:07 rob006

right now I have a very little load. May be 1 job per minute

akorz avatar Jul 06 '18 09:07 akorz

I have ~300k waiting jobs in queue and DB driver (MariaDB) becomes unusable - it took like 2-3 seconds to reserve a job (which executes in less than 0.1 second, so queue spends most of its time on reserving jobs).

rob006 avatar Jul 06 '18 10:07 rob006

You are getting lock contention, meaning that all workers are trying to obtain a lock at the same time. For those kinds of loads it makes more sense to use a real queue like beanstalkd. (Note that is is really easy 0 configuration to setup, and it will make your life easier and your queue faster)

SamMousa avatar Jul 10 '18 09:07 SamMousa

the same story with MySQL

+1

samokspv avatar Jan 04 '19 13:01 samokspv

I have the same problem in a Postgresql database "PostgreSQL 10.4 (Debian 10.4-2.pgdg90 + 1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18 + deb9u1) 6.3.0 20170516, 64-bit "

it seems a subject of the database configuration.

Is there any idea how to solve this? What effects can this have? Does this error leave unresolved tasks?

geopamplona avatar Jan 11 '19 08:01 geopamplona

+1 Why does the error 'Has not waited the lock' occur? What is the cause?

JorgePalaciosZaratiegui avatar Jan 11 '19 08:01 JorgePalaciosZaratiegui

OP problem is related to mutex settings, see this answer on SO.

In general if you get such errors, you should switch mutex backend to some more reliable implementation (MysqlMutex works fine for me), and/or increase mutexTimeout (setting it to 0 will throw this exception on every concurrency issue, which is very impractical for real queue).

rob006 avatar Jan 20 '19 13:01 rob006

Doesn't seem there's anything to fix in the code but definitely worth documenting it.

samdark avatar Jan 20 '19 15:01 samdark

We too have been bitten by the Has not waited the lock exception. Our queue is currently configured as follows:

'queue' => [
    'class' => \yii\queue\db\Queue::class,
    'db' => 'queue-db',
    'tableName' => '{{%queue}}',
    'channel' => 'default',
    'attempts' => 3, // Max number of attempts
    'ttr' => 60, // Maximum duration a job should run before it is considered abandoned
    'mutex' => \yii\mutex\MysqlMutex::class,
    'as jobMonitor' => \zhuravljov\yii\queue\monitor\JobMonitor::class,
    'as workerMonitor' => \zhuravljov\yii\queue\monitor\WorkerMonitor::class,
]

We are processing ~15,000 - 20,000 jobs an hour with 8 concurrent workers. We see when DB load gets high the lock takes longer and times out. Our current thinking is to move the mutex to something other than MySQL (redis in our case) so that high db load does not impact the worker's ability to take/close locks. In my evaluation of the code it seems fine to have a back-end different than the mutex provider (e.g. keep mysql back-end, but move the mutex to redis). Just wanted to ping the community to get your thoughts on this approach - any red flags?

darrylkuhn avatar Oct 23 '19 17:10 darrylkuhn

@darrylkuhn It will probably not change anything, because it is highly unlikely that mutex is the bottleneck here. The problem is in the process, which holds lock. DB driver has know performance issues - in big queues reserving job may take a while. If you want to change something, I would rather replace DB queue driver with something else.

rob006 avatar Oct 23 '19 17:10 rob006

Use a better driver like beanstalk for job queues. The database is a bad place for a job queue.

SamMousa avatar Oct 23 '19 17:10 SamMousa

@rob006 and @SamMousa understood we'll probably end up taking that approach. Thanks for the feedback.

darrylkuhn avatar Oct 23 '19 17:10 darrylkuhn

Fix it by change two methods - https://github.com/yiisoft/yii2-queue/pull/362/files

mathematicalman avatar Nov 29 '19 12:11 mathematicalman

Fix it by change two methods - https://github.com/yiisoft/yii2-queue/pull/362/files

Absolutely confirmed fix for my MariaDB instance.

rowansimms avatar May 27 '20 03:05 rowansimms

We had this problem 8 months ago and moved to Amazon SQS however the lack of visibility and the inability to delay more than 15 minutes has us looking for another back end again. Can anybody speak to whether or not the lock issue here is present with the redis driver?

darrylkuhn avatar Jul 02 '20 02:07 darrylkuhn

Have you tried beanstalkd?

SamMousa avatar Jul 02 '20 06:07 SamMousa

@SamMousa No we haven't. We have redis infrastructure already and was hoping to leverage that however we're processing ~20,000 jobs an hour; given that workload am I correct in assuming that redis is also not an appropriate back-end? Thanks.

darrylkuhn avatar Jul 02 '20 16:07 darrylkuhn

I don't think it's that much to be honest.. but redis is not a job queue. If queue jobs are ephemeral beanstalkd is very simple to set to though.

SamMousa avatar Jul 02 '20 16:07 SamMousa

@darrylkuhn Redis does not have this performance problem (I'm using it to handle hundreds of thousands jobs per hour), as long as you configured your mutex correctly (mutexTimeout is not 0). But it has some other limitations, like lack of priorities, and it is not really atomic, so in unstable environment you may end up with inconsistent queue.

rob006 avatar Jul 02 '20 17:07 rob006

@SamMousa and @rob006 thanks both for the input - think I'll probably suck it up and spin up beanstalkd 👍

darrylkuhn avatar Jul 02 '20 22:07 darrylkuhn

In my case, the problem was the absence of an index on the done_at field.

UPDATE queue SET reserved_at=null WHERE reserved_at < :time - ttr and done_at is null; This query initiated by moveExpired() method and works slowly, if deleteReleased=false, mutex expired, worker crashed with exception, restarted by supervisor and crash again.

After create done_at index all works fine. I think, need update documentation for this index https://github.com/yiisoft/yii2-queue/blob/master/docs/guide/driver-db.md

ixapek avatar Oct 30 '20 09:10 ixapek

In my case, the problem was the absence of an index on the done_at field.

UPDATE queue SET reserved_at=null WHERE reserved_at < :time - ttr and done_at is null; This query initiated by moveExpired() method and works slowly, if deleteReleased=false, mutex expired, worker crashed with exception, restarted by supervisor and crash again.

After create done_at index all works fine. I think, need update documentation for this index https://github.com/yiisoft/yii2-queue/blob/master/docs/guide/driver-db.md

i think , this query need to add channel as filter since this update query takes 8 second in my case with 500k queue

freddokresna avatar Apr 07 '22 06:04 freddokresna

I think if you have 500k jobs then db driver is not the best choice (especially if you use several channels), use a specialized messaging service instead. The db driver is meant to have just a small number of concurrent jobs where index is irrelevant on select and inserting/updating jobs should be as fast as possible.

fl0v avatar Apr 09 '22 16:04 fl0v