ComfyUI_frontend icon indicating copy to clipboard operation
ComfyUI_frontend copied to clipboard

Improve performance when large number of prompts are queued

Open christian-byrne opened this issue 10 months ago • 16 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

There are some users who queue a large number of prompts at a time with large graphs. It is commonly reported from them that overall performance suffers a lot when doing this. The issue is the frequency and eagerness of requests/messages to the queue endpoint.

Reproduction Steps

  1. Open DevTools in the browser (F12)
  2. Go to the Network tab
  3. In the search/filter bar type 'queue'
  4. Go back to the graph without closing DevTools
  5. Create a workflow with a lot of nodes
  6. Increase the batch count on the Run button to a high number
  7. Press Run
  8. Observe the /queue responses in the Network panel: Image
  9. Notice that:
    • The /queue response includes the entire workflow for every single pending task (along with other fields). The type is here: https://github.com/Comfy-Org/ComfyUI_frontend/blob/8db088b27a5669e423b1d2644fa574bf4a3c3d85/src/schemas/apiSchema.ts#L185-L193
    • Requests to /queue are eager
    • The queueStore updates the queue and history in the same update action. As a result, /queue requests are made when tasks are created and when tasks are completed, which is probabaly not necessary.
    • Requests to /queue are made for each task created, even those created in batches

Possible solutions

  1. Send prompts to server less eagerly when batch size of queue button is large
  2. Only request necessary properties of queued prompts
  3. Don't repeatedly request information about queued items since they are not subject to change
  4. Add max_items param to queue endpoint, similar to history
  5. Evaluate graph less eagerly or use DP, although the issue seems to be with the large frequent requests rather than graph evaluation

Proposed workflow

  1. Load large graph
  2. Increase batch count to 200+ e.g., to run over-night or while out of house
  3. Press queue button
  4. The app works at same efficiency/speed as it does during normal usage

Additional information

The /history endpoint, used for completed tasks, faced the same issue in the past. It was solved by limiting the number of items returned in the request (https://github.com/Comfy-Org/ComfyUI_frontend/pull/393).

The maxItems arg is supported in the /history route handler on the backend, but not for the /queue handler.

For reference, implementation of max_items for /history:

Related issues

  • https://github.com/comfyanonymous/ComfyUI/issues/2000#issuecomment-2631586686
  • https://github.com/comfyanonymous/ComfyUI/discussions/7284
  • https://github.com/comfyanonymous/ComfyUI/issues/6939
  • https://github.com/comfyanonymous/ComfyUI/issues/6676
  • https://github.com/comfyanonymous/ComfyUI/issues/5073
  • https://github.com/comfyanonymous/ComfyUI/issues/2000
  • https://github.com/comfyanonymous/ComfyUI/issues/8070

┆Issue is synchronized with this Notion page by Unito

christian-byrne avatar Feb 05 '25 18:02 christian-byrne

Is there any progress on this? I completely disabled all /history requests to see if it would help, but it still slows down significantly after ~80-100 items in queue.

scottmudge avatar Mar 16 '25 23:03 scottmudge

There's no progress yet. Potentially some OSS contributor can be a hero and solve this.

To my knowledge, the issue is actually with the /queue endpoint. I just updated the issue body with more details. You may be able to adjust your solution for the /queue endpoint as a temporary workaround.

christian-byrne avatar Mar 17 '25 04:03 christian-byrne

i didn't used vue in my life, but do you think if we gzipped the payload it would be better? then i think we will need the backend to expect the new gzipped data i can try doing it but i need a way to "simulate" running the app without killing my laptop

nasoooor29 avatar Mar 26 '25 16:03 nasoooor29

Yes, I think that could work.

christian-byrne avatar Mar 27 '25 15:03 christian-byrne

I'll start working on it, but don't get your hopes up since I haven't used Vue before. Do you have a way to simulate the backend?

First, I'll try gzipping. If the improvement isn't significant, I'll create a new endpoint for batched data with gzip compression. That way, the complex graph will be minimized, and only the dynamic parts, like the seed, will take up space on the request.

nasoooor29 avatar Mar 27 '25 16:03 nasoooor29

However, creating the batch endpoint might be a problem since the backend would need modifications, and it's in a separate repository, is that would be fine?

nasoooor29 avatar Mar 27 '25 16:03 nasoooor29

I don't know of an easy way to quickly simulate the backend, but you can start the backend server with the --cpu arg to run on a low-end machine or laptop. And Development section of README for setting up frontend dev server if you have not seen it.

Regarding the compression idea, please also note this: https://github.com/comfyanonymous/ComfyUI/pull/7231#issuecomment-2726306884. I wonder if using this option can resolve the issue somewhat.

christian-byrne avatar Mar 27 '25 17:03 christian-byrne

I just tested the flag, and the results are a bit strange.

Test Setup

I loaded the models in both runs and tried to keep the conditions as similar as possible.

Run 1: Without --enable-compress-response-body

  • Time to load 100 requests: 1 minute 36 seconds (00:01:36.80).
  • The last ~15 requests had huge payloads (~4MB each), increasing quickly.
  • Each request took around 10 seconds to resolve.
    Image

Run 2: With --enable-compress-response-body

  • Time to load 100 requests: 1 minute 50 seconds (00:01:50.80).
  • The last ~15 requests had small payloads (~640KB, consistent size).
  • However, the last 15 requests took more than 30 seconds each to resolve???
  • I also got this warning on all prompts:
    UserWarning: Synchronous compression of large response bodies (1934928 bytes) might block the async event loop. 
    Consider providing a custom value to zlib_executor_size/zlib_executor response properties or disabling compression on it.
    

Image

Analysis & Solution

It looks like this option can fully fix the payload size issue, but the backend needs a small modification.
The problem seems to be that Python's event loop is getting blocked because the compression is happening synchronously.

Possible Fixes: (GPT suggestions)

  1. Use aiohttp and configure compression properly to avoid blocking the event loop.
  2. Adjust zlib_executor_size/zlib_executor to offload compression to a separate thread.
  3. Disable compression for very large responses and only apply it selectively.

If the backend is modified correctly, this should work without causing delays.

nasoooor29 avatar Mar 27 '25 18:03 nasoooor29

and when i went to the backend i couldn't find any option to put the zlib_executor on seperate thread the code should be like this

from aiohttp import web

app = web.Application()
app['zlib_executor_size'] = 4  # Increase the number of threads for compression

async def handler(request):
    data = "Large response content" * 100000  # Simulating a large response
    return web.Response(text=data, compress=True)

app.router.add_get("/", handler)
web.run_app(app)

or we create our own thread pool and assign it

import asyncio
from concurrent.futures import ThreadPoolExecutor
from aiohttp import web

app = web.Application()

# Create a thread pool executor for compression tasks
compression_executor = ThreadPoolExecutor(max_workers=4)

app['zlib_executor'] = compression_executor  # Assign it to aiohttp

async def handler(request):
    data = "Large response content" * 100000
    return web.Response(text=data, compress=True)

app.router.add_get("/", handler)

loop = asyncio.get_event_loop()
web.run_app(app, loop=loop)

nasoooor29 avatar Mar 27 '25 18:03 nasoooor29

Is it really the size of the network transfer that is the issue? On loopback (running locally), I can transfer in many GB/s, yet the throughput of the queuing process is only a small fraction of that. Despite this, it is still very slow.

It may help with transfer overhead for remote connections, but I don't think gzipping/compressing the transfer is going to improve the speed for queuing many runs in a row.

I think it's much more likely it's the parsing/processing of the entire node-graph after it's been sent (or perhaps before, when it's getting serialized) that is the issue, not the transfer itself.

scottmudge avatar Mar 27 '25 18:03 scottmudge

Hi, in https://github.com/comfyanonymous/ComfyUI/issues/6939 I suggest to "Queue up all items before starting", do you think that this is a misdiagnosing of the problem?

It seemed like it was lagging because the workflow was running at the same time it was queuing things. But maybe you guys would say that this doesn't matter at all?

rainlizard avatar Mar 27 '25 19:03 rainlizard

must be solved

KiyeopYang avatar Mar 30 '25 17:03 KiyeopYang

For anyone want to quick fix this issue, just add one line to the file: venv\Lib\site-packages\comfyui_frontend_package\static\assets\index-C1a9-oeR.js

line: 60543

  async getQueue() {
    //fix queue slow
    return { Running: [], Pending: [] };

I just add 900 tasks to queue in 10s, and running good.

This bug already exist for months, seems no one care. According to the code, each time add one need fetch all. so technically it is N*(N-1) or somehow heavy operations?

roytan883 avatar Apr 01 '25 07:04 roytan883

I wrote a small change making Comfy a lot faster when the prompt queue is large. It won't work for thousands of queue items, but it's good enough for several hundred.

https://github.com/comfyanonymous/ComfyUI/pull/8176

miabrahams avatar May 17 '25 21:05 miabrahams

I suggest doing 2 things:

  • Mark as a bug rather than a feature request 😜
  • Consider adding paging (On top of the improvements suggested (i.e. by @miabrahams and @catboxanon)) and return only portion of the queue rather than the whole thing. This will cap the performance impact at the maximum page size (say 100 items). I know it might need some brain cells to muscle up since right now the queue uses heap for storage and ain't particularly "pageable" on its own.

QuietNoise avatar May 20 '25 06:05 QuietNoise

Mark as a bug rather than a feature request 😜

Good point, done!

Consider adding paging (On top of the improvements suggested (i.e. by miabrahams and catboxanon)) and return only portion of the queue rather than the whole thing. This will cap the performance impact at the maximum page size (say 100 items). I know it might need some brain cells to muscle up since right now the queue uses heap for storage and ain't particularly "pageable" on its own.

Good idea. We would also need to adjust the logic a bit on the client side, as some components like the queue sidebar rely on the information.

Since https://github.com/comfyanonymous/ComfyUI/pull/8176 is merged, we should monitor the performance and any community feedback for a while, then determine if more improvements are needed on client side.

christian-byrne avatar May 22 '25 18:05 christian-byrne

Based on user reports and communtiy sentiment, it seems this issue is mostly solved by https://github.com/comfyanonymous/ComfyUI/pull/8176. Of course, the performance is not leetcode optimal, but it only seems to be affecting those who are using ComfyUI as a backend for a more involved application or use case -- and in those cases, the optimzations necessary might be the burden of the developers involved or require more comprehensive efforts on backend.

If you still experience this issue (arriving later), make sure you are on ComfyUI version afte https://github.com/comfyanonymous/ComfyUI/pull/8176 and that the issue is not caused by some system- or environment-specific conditions. If you've confirmed both those, please report below and I will re-open.

Thanks again to miabrahams for fixing this issue.

christian-byrne avatar Jun 06 '25 20:06 christian-byrne