exq icon indicating copy to clipboard operation
exq copied to clipboard

How many queues can I create?

Open ericdude4 opened this issue 2 years ago • 10 comments
trafficstars

Sorry, this is not an issue. I'm just wondering if someone from the Exq core team would be able to advise on how many queues can be created. I imagine this might come down to a Redis limitation? What is the upper limit for the number of queues I can create?

So far, my application is managing ~5,600 queues gracefully. How long can this scale for?

ericdude4 avatar Sep 27 '23 14:09 ericdude4

Exq polls redis for each and every queue independently using RPOPLPUSH. Let's say your poll_timeout is 50ms, it will poll redis 20 times per second, 20 * 5600 = 112000 ops. Redis usually can handle this, but there is a limit. I think up to 1 million you can easily scale, after that, it might get tricky. I am also talking about the job polling, it does a lot of other things. I would suggest you start looking at Redis ops per second metrics first, you can do a redis-cli monitor (not safe for production, run it locally) to get a rough idea about the commands executed by exq.

I think you are trying to design a queue per user? If the count could go up to 100k, I would suggest not to go this route. Also, Exq implementation itself might not be optimized. Most deployments I have seen are < 100 queues.

ananthakumaran avatar Sep 27 '23 14:09 ananthakumaran

Agreed with @ananthakumaran.

There's also some performance tests you can try here and adapt (we did some optimization a while back but not related to this case): https://github.com/akira/exq/blob/master/test/performance_test.exs In the case you don't need stats, you can also disable those which would reduce the qps on Redis.

akira avatar Sep 27 '23 14:09 akira

@akira @ananthakumaran Thank you for your insights on this. I appologize that I didn't respond sooner, but I have been thinking about this problem, especially as the application continues to grow and place more load on Exq. I am noticing more "gremlins" with regards to jobs being processed predictably. Sometimes, I notice that a queue which is subscribed to, simply doesn't execute the jobs within.

I have the following thought for a potential workaround to buy me some more time. Basically, out of the thousands of queues which exist in the application, only around 10 - 20 ever have any jobs queued up at a given time. I'm thinking that I can "prune" the queues which don't have any jobs every 5 minutes. Then, when I need to queue up a job for that client later on, the application will create the new queue and subscribe to it dynamically.

I expected I could make the "prune" function as follows:

def prune() do
  {:ok, queues} = Exq.Api.queue_size(Exq.Api)

  Enum.each(queues, fn {queue, jobs} ->
    if jobs == 0 do
      Exq.Api.remove_queue(Exq.Api, queue)
    end
  end)
end

The problem I am facing here though, is that redis-cli monitor still shows RPOPLPUSH commands being run for the queue, even after the queue has been removed. I also tried this with Exq.unsubscribe Exq.Api, "queue-name" but found the same result.

It seems like the queues remain in the Exq cache even after they are removed, causing the application to continue executing the RPOPLPUSH somehow.

ericdude4 avatar Mar 19 '24 20:03 ericdude4

Okay, quick update. After switching the above code to use Exq.unsubscribe Exq.Api, "queue-name", it seems to be working much better. However, I'm curious if you have any insight as to why this might be a bad idea? seems like I can keep my qps much lower on average if I run this worker every few minutes.

ericdude4 avatar Mar 19 '24 21:03 ericdude4

Exq.Api.remove_queue(Exq.Api, queue)

This is not necessary and might delete actual jobs. unsubscribe is all you need, though it needs to be run on all worker nodes

if I run this worker every few minutes.

From what I understand, you are delaying the job execution (after enqueue) by a few minutes for queues with infrequent jobs. If this is ok, then unsubscribe might work.

ananthakumaran avatar Mar 20 '24 02:03 ananthakumaran

Hey @ananthakumaran, thank you for your thoughtful response. I enqueue jobs for users immediately based on incoming webhooks, with the following logic:

def enqueue_job(user) do
  # Check for an existing subscription for this user.
  {:ok, existing_subscriptions} = Exq.subscriptions(Exq)

  subscription_already_exists? = Enum.find(existing_subscriptions, &(&1 == user.name))

  unless subscription_already_exists? do
    # Subscribe to the queue if a subscription is not already in place.
    Exq.subscribe(Exq, user.name)
  end

  # Enqueue the job immediately, with the subscription is in place.
  Exq.enqueue(Exq, user.name, Foo.Worker, [])
end

def prune() do
  # Get list of queues along with their queue size
  {:ok, queues} = Exq.Api.queue_size(Exq.Api)

  {:ok, subscriptions} = Exq.subscriptions(Exq)

  Enum.each(queues, fn {queue, jobs} ->
    if queue in subscriptions and jobs == 0 do
      # If there are no jobs in the subscribed queue, unsubscribe from that queue
      Exq.unsubscribe(Exq, queue)
    end
  end)
end

The prune() worker gets run every 5 minutes, removing all queues which don't have any jobs. Enqueuing a job always checks for the presence of a queue, creating it if it doesn't exist.

I ran some tests and this really improved things a lot, since 99% of the queues which were created dynamically have 0 pending jobs at any given moment. This pruning approach keeps the ~10,000 subscriptions which were in place previously (making 20 RPOPLPUSH requests per second for each subscription) down below 100 - 200 subscriptions on average.

With this in mind, do you see anything which I might have failed to consider? Once again, I really appreciate your thoughts.

Eric

ericdude4 avatar Mar 20 '24 13:03 ericdude4

If you know when jobs are getting enqueued, then this approach would work, though I haven't given much thought about how it would play with multiple worker nodes.

Exq also has a Dequeue behaviour which can be overridden. This was added to support rate limiter (see https://github.com/ananthakumaran/exq_limit), you might be able to re-purpose it for your use case.

ananthakumaran avatar Mar 20 '24 14:03 ananthakumaran

Please forgive my ignorance, but what is the reason Exq uses polling as opposed to Pub/Sub?

nicnilov avatar May 18 '24 14:05 nicnilov

Pub/Sub is not persistent. There is a blocking variant of rpoplpush called brpoplpush, but it doesn't support multiple lists https://github.com/redis/redis/issues/1785. Exq also needs to be compatible with Sidekiq, which restricts what data structures are allowed

If I were to write a Job processing library today, I would use redis streams, but that's not possible with Exq due to Sidekiq compatibility.

ananthakumaran avatar May 18 '24 14:05 ananthakumaran

That makes sense, thanks!

nicnilov avatar May 18 '24 14:05 nicnilov