bull icon indicating copy to clipboard operation
bull copied to clipboard

Rate Limiter per queue wrong duration

Open EmperiorEric opened this issue 7 years ago • 9 comments

Description

It is quite possible I have misconfigured something, but at the moment it would appear that the queues are not respecting their rate limits and are instead performing significantly slower than the durations we've set. I have several queues with different rate limits. For one, they appear to work at the same rate of speed. And worse that rate of speed seems to be significantly higher than the duration we've set on the limiter.

Minimal, Working Test code to reproduce the issue.

Here are a couple queues we setup. In total we have 6 queues, each with a different rate limit specific to the service.

let queueOptions = {
    createClient: function (type) {
        switch (type) {
            case 'client':
                return client
            case 'subscriber':
                return subscriber
            default:
                return defaultRedis
        }
    }
}

const lowPriorityQueue = new Bull('lowPriority', {
    createClient: queueOptions.createClient,
    limiter: {
        // process max 1 request
        max: 1,
        // per 1.5 seconds for a total of 2400 jobs per hour.
        // DNSimple's rate limit is 2400, so its a good base.
        duration: 1500
    },
})

const dnsimpleQueue = new Bull('dnsimpleQueue', {
    createClient: queueOptions.createClient,
    limiter: {
        // process max 1 request
        max: 1,
        duration: 3600
    },
})

In this case the job is a simple database row delete. It takes little to no time. But looking at the completed events from the queue, there is nearly 75 seconds in between each job.

Nov 07 15:40:33 app/worker.1: [Jobs: lowPriorityQueue] processing 'scrub subdomain' job #507508 with priority 9007199254740991 
Nov 07 15:40:33 app/worker.1: [Jobs: lowPriorityQueue] Low Priority Job #507508 scrub subdomain successfully completed with result: undefined 
Nov 07 15:41:49 app/worker.1: [Jobs: lowPriorityQueue] processing 'scrub subdomain' job #379951 with priority 9007199254740991 
Nov 07 15:41:49 app/worker.1: [Jobs: lowPriorityQueue] Low Priority Job #379951 scrub subdomain successfully completed with result: undefined 
Nov 07 15:43:05 app/worker.1: [Jobs: lowPriorityQueue] processing 'scrub subdomain' job #379963 with priority 9007199254740991 
Nov 07 15:43:05 app/worker.1: [Jobs: lowPriorityQueue] Low Priority Job #379963 scrub subdomain successfully completed with result: undefined 
Nov 07 15:44:21 app/worker.1: [Jobs: lowPriorityQueue] processing 'scrub subdomain' job #514633 with priority 9007199254740991 
Nov 07 15:44:21 app/worker.1: [Jobs: lowPriorityQueue] Low Priority Job #514633 scrub subdomain successfully completed with result: undefined 
Nov 07 15:45:37 app/worker.1: [Jobs: lowPriorityQueue] processing 'scrub subdomain' job #178134 with priority 9007199254740991 
Nov 07 15:45:37 app/worker.1: [Jobs: lowPriorityQueue] Low Priority Job #178134 scrub subdomain successfully completed with result: undefined 

Bull version

version: 3.4.8

EmperiorEric avatar Nov 07 '18 20:11 EmperiorEric

sorry for late response. Is it possible for you to isolate the problem in just one queue, or does this only happen when using several queues and sharing a redis connection?

manast avatar Nov 14 '18 07:11 manast

It specifically happens when using several queues sharing a redis connection it seems. It almost seems like its adding up or multiplying all the rate limits together. When I remove some queues, it seems to process faster.

Is the shared QueueOptions bad practice? Would separate redis connections for each queue fix this issue?

EmperiorEric avatar Nov 19 '18 18:11 EmperiorEric

It should be possible to reuse connections, and I do not understand why that would trigger this issue. I will mark it as a bug and see if I can find what it is causing it.

manast avatar Nov 21 '18 20:11 manast

Any movement on this? Or any new information I can try to provide from our system? Our jobs are slowly backing up unfortunately as on high volume days it can't keep up with the additional delay.

EmperiorEric avatar Dec 04 '18 14:12 EmperiorEric

sorry, I hadn't time to dig into this issue. Reusing redis connections should not be a problem, so we are doing something wrong somewhere :/

manast avatar Dec 05 '18 13:12 manast

Any suggestions for a work around for the time being? Would unique redis connections per queue potentially fix it? It appears the rate limit times are being added together in some way, do you think just reducing all from milliseconds to hundredths of a second would keep the same proportional rate limits but reduce their extra length factor by 10?

Separately, I'm running this on a dyno on heroku. If I spin up a second dyno and run the exact same queue processing code, would that work the way I'm dreaming just doubling the number of jobs we can process?

EmperiorEric avatar Dec 12 '18 13:12 EmperiorEric

Let me know if it works by using separare redis connections, in that case I can focus on the multi-redis connection case, see how and why it does not work. Regarding adding more workers, as long as you do not saturate redis you should expect approx. double of processed jobs by using 2 workers in different machines.

manast avatar Dec 12 '18 20:12 manast

So we stopped sharing redis connections, setting up 3 new redis connections for each queue. And it does seem to have fixed the rate limit issue. It looks like all our queues are running at their specific rate limits. Jobs are no longer delayed by some multiple of the rate limits.

So hopefully that helps.

EmperiorEric avatar Jan 04 '19 21:01 EmperiorEric

When sharing connections it is crucial that the shared connection never is used in a blocking call, since the blocking call will delay all other operations until that call is resolved. I think the issue may be related to this fact.

manast avatar Jan 05 '19 13:01 manast