bullmq icon indicating copy to clipboard operation
bullmq copied to clipboard

Scaling vertically and horizontally

Open jsbroks opened this issue 5 years ago • 21 comments

I was wondering if you could provide information with regards to scaling BullMQ vertically and horizontally? Does it work well with redis cluster?

I also suggest adding a section in the docs as well.

jsbroks avatar Jan 15 '20 04:01 jsbroks

Interested about this as well 🚀

botzill avatar Jan 26 '20 12:01 botzill

this is actually more a question for redis in general, but the thing that is specific for bull is that one queue cannot scale horizontally so in order to scale you need to divide your solution in multiple queues, so that they are distributed evenly in your cluster.

manast avatar Jan 28 '20 20:01 manast

@manast thanks for the info.

If I have 1 queue that I need to scale, what are my options?

jsbroks avatar Feb 01 '20 14:02 jsbroks

what are your requirements in jobs/second?

manast avatar Feb 01 '20 21:02 manast

The job is event-based and scales with the number of items in a database. At peak, it can be as high as 1-5 job requests a minute with a job lasting 10-30 minutes depending on the amount of data being processed.

jsbroks avatar Feb 01 '20 22:02 jsbroks

in that case scalability is not a problem, Bullmq can process easily 5k jobs per second, you just need to add more workers and increase concurrency to meet your demands.

manast avatar Feb 01 '20 22:02 manast

So by running the same node ~~instance~~ code across multiple servers (pointed to the same redis) I can increase the number of workers?

jsbroks avatar Feb 01 '20 22:02 jsbroks

technically if you run the same code in different servers they will be different instances, so yes.

manast avatar Feb 01 '20 23:02 manast

Yup bad wording on my end, but you got what I meant , thanks!

jsbroks avatar Feb 01 '20 23:02 jsbroks

So let's say I have a chat app, and I want to process a lot of jobs fast. Should I create like 10 queues, and set the process concurrency to 5,000?

What's the max concurrency per process I should set?

tskweres avatar Mar 20 '20 15:03 tskweres

The proper amount of concurrency is something you need to finetune by doing some calculations that depends on your particular time for processing messages, your particular latency requirements, and so on. Maybe for a pure chat application you will find redis streams a better match.

manast avatar Mar 20 '20 22:03 manast

@manast

in that case scalability is not a problem, Bullmq can process easily 5k jobs per second, you just need to add more workers and increase concurrency to meet your demands.

I'm brand new to redis/bullmq. I understand that node code can be horizontally scaled. Can you elaborate how redis might be provisioned to have it process 5k jobs a second? Is this assuming one queue in a single redis instance or multiple queues across a redis cluster? And if in a redis cluster, do we still need to attach a prefix as described in the old bull docs?

mikylebaksh avatar Mar 25 '20 15:03 mikylebaksh

you don't need to do anything special, with a standard redis deploy on modern hardware and increasing parallelism of 1 worker (concurrency 200 or more for example) you can achieve it.

manast avatar Mar 25 '20 21:03 manast

@manast can you verify that the prefix hash tag described in the old docs is still needed for Redis Cluster in BullMQ? If so, is this workaround all that is needed?

https://github.com/OptimalBits/bull/blob/develop/PATTERNS.md#redis-cluster

joebowbeer avatar Jun 02 '20 16:06 joebowbeer

The hash tag will always be needed, the other question is if ioredis works correctly or not, but I think some people has been successful running cluster with bull lately so I think it works. It is quite digital, it either works or it doesn't so just making a test you will notice.

manast avatar Jun 02 '20 19:06 manast

@manast could you shine some light on how to tweak concurrency and different amount of queues? I just migrated from bull 3.x to bullmq and with no other changes to the underlying code and infrastructure the jobs are a lot (~ 400%) slower processed than before. I only have one queue and one job processor that does a switch statement to process 8 - 10 different type of jobs (I was using named jobs in bull 3.x so I though that's the closest I can get without refactoring too much code).

As I only ever see 2 active jobs within the arena dashboard, I guess the default concurrency is set to 1. (I have actually three worker nodes – not sure why the third one is not picking up jobs but I guess its not bullmq related). There are a couple of jobs which are long running (< 10 min) and a lot of jobs which run within a few hundred ms (< 1 second). The short running jobs can run while a long running is still being processed.

  • I noticed that while the long running job still runs, the short running a processed a lot slower, why is that?
  • Should I create two seperate queues for short running and long running jobs and than tweak concurrency accordingly?
  • Whats the best way to find out a good concurrency number? Just trial and error?

jaschaio avatar Jun 09 '20 06:06 jaschaio

First, I think it is pretty strange that you got 400% slower performance, it would be interesting if you can isolate in a simple case such performance disparity so that we can see if it is a bug that is affecting performance. I think it is a good idea to keep different kind of jobs in different queues but that depends entirely on your requirements and your system design, there is no right or wrong without having the whole context. Same with concurrency, usually for queues with jobs that are very IO intensive you can have a higher concurrency factor whereas with cpu intensive jobs you will probably want to keep the concurrency to lower levels. Check the CPU stats to get an idea if you can increase concurrency depending on how much your CPU is idling.

manast avatar Jun 09 '20 15:06 manast

@manast I tried some tests locally and couldn't replicate the performance difference I saw with my production app. But I guess my test might have been too naive to resemble the production situation. Any suggestions on how to improve it or do you have some code I could use for performance testing?

Bull Test Code Bullmq Test Code

jaschaio avatar Jun 09 '20 20:06 jaschaio

in that case scalability is not a problem, Bullmq can process easily 5k jobs per second, you just need to add more workers and increase concurrency to meet your demands.

I don't know how accurate that is. We have 2000 per Minute, and one of the nodes in our Redis Cluster is never less than 70% occupied on one core. If anything at all causes a stall, it can easily reach 100% and then you are pretty screwed. And adding more workers obviously doesn't help with that, usually it even increases load. The issue is that Redis, even clustered is single core, and bullmq even in cluster mode only uses one shard per queue.

I realise we could make the same queue x times hoping the other queues land on different shards and then round robin which shard you put things into, but this seems like a workaround to an issue that should be fixed in bullmq itself. If nothing else you could literally implement what you suggest people do as the workaround.

autarchprinceps avatar Oct 27 '23 11:10 autarchprinceps

If by workaround you mean dividing your jobs into different queues so that they can be assigned to different shards, then the answer is no, we will not do this internally in BullMQ. Scaling horizontally depends on many variables, and you should be able to find the best queue segmentation to fit your particular scaling needs.

manast avatar Oct 30 '23 11:10 manast

this is actually more a question for redis in general, but the thing that is specific for bull is that one queue cannot scale horizontally so in order to scale you need to divide your solution in multiple queues, so that they are distributed evenly in your cluster.

@manast Apologies for resurrecting this old comment but we are just adding BullMQ into our architecture and I want to be sure this is the right tool for us. Could you confirm what you mean by queues cannot scale horizontally? I hope you are not saying that we cannot add multiple jobs from different instances to the same queue.

I assume that when we queue a a job into a particular server instance, then the "queuing" is happening in the same instance so we are indeed scaling horizontally. As more server instances we have, more jobs we can queue. Of course, the real processing speed is determined by the worker instances which will take and process those jobs at whatever rate they can.

For extra information, we are about to implement pm2 and most likely have an app that queues jobs in multiple instances, but we will use the same queue name for all instances. I guess we could do queue-jobname-1, queue-jobname-2, etc, but I am failing to see the need.

Just trying to understand what is the limitation here, before we make an architecture mistake when designing our solution. Thank you!

sfratini avatar Jun 11 '24 10:06 sfratini