[Bug] Significant First-Run Latency for KSampler and VAE Decode with ComfyUI-JoyCaption
Dear Developers,
I'm reporting a performance issue related to the ComfyUI-JoyCaption plugin. After installing this plugin, which also involved installing llama-cpp-python, I've observed a significant first-run latency for both KSampler and VAE Decode nodes whenever ComfyUI is launched and the first image generation is initiated.
The specific latencies are as follows:
KSampler: Approximately 30 seconds of delay. VAE Decode: Approximately 80 seconds of delay. This delay only occurs during the very first generation after launching ComfyUI; subsequent generations proceed at normal speeds.
Steps Taken for Diagnosis:
Isolation Test: I temporarily moved the ComfyUI-JoyCaption plugin out of the custom_nodes directory. After restarting ComfyUI, the first-run latency for KSampler and VAE Decode completely disappeared. Reproduction Confirmation: Upon moving the ComfyUI-JoyCaption plugin back into the custom_nodes directory and restarting ComfyUI, the issue reappeared, confirming ComfyUI-JoyCaption as the direct cause of this specific latency. Preliminary Hypothesis: My initial guess is that either ComfyUI-JoyCaption itself or its dependency llama-cpp-python might be performing some computationally intensive operations (e.g., model loading, compilation, or initialization) during its first load cycle. This could be blocking or significantly delaying other workflow nodes from executing, especially KSampler and VAE Decode, which are often GPU-intensive. Given llama-cpp-python typically involves CPU inference, this might explain why it's impacting operations that precede or coincide with heavy computation.
Environment Information:
Operating System: Ubuntu 24.04 GPU: AMD Radeon RX 7900 XT (gfx1100) AMD Arch: gfx1100 PyTorch Version: 2.8.0+rocm6.4 ROCm Version: 6.4
Thank you for your detailed report and thorough diagnosis.
We’ve reviewed your findings, and it’s likely that the initial delay is due to llama-cpp-python initializing or compiling the model during the first run, even if the JoyCaption node isn’t used. We’ll investigate ways to defer or optimize this initialization to prevent it from impacting unrelated nodes like KSampler and VAE Decode.
We appreciate your feedback and will aim to address this in a future update.
Hi, I came here via a search hit. I'm experiencing what I believe to be the same issue regardless of whether I use JoyCaption or not. It took quite a while to track it down, but if I move the JoyCaption custom node folder out of custom_nodes to temporarily disable it, I experience no issues.
This happens on pretty much EVERY workflow I have, regardless of whether I'm using JoyCaption or not. When I run any workflow, I notice my GPU core and memory usage flip-flop back and forth rapidly between 100% and 0% utilization at least a dozen times, maybe more, on about a 1 second interval. This happens at the start of generation and also the end of generation, lining up with what the OP says with sampler and vae decode steps.
Note as mentioned, this happens with ALL workflows even unrelated to JoyCaption. I'm not sure how just having the JoyCaption custom_node initialized during startup but not even used can be causing this. This is a critical issue for me because I use JoyCaption pretty frequently but cannot bear to have a 30-45 second delay at the start and end of every single workflow I run.
Hey, just looping back. I manually pulled Arihany's update and it fixed the problem for me. No more slowdowns at the start or end of generations and JoyCaption appears to be working fine still (from my limited testing).
Arihany, Thank you!
Any updates?