What is going on in the land of ComfyUI
Beware, rant ahead!
Since a couple of days I am trying to get back up and running. Bootstrapping here and there, but I have a feeling this is not gonna go down well...
I started using ComfyUI about 5 - 6 months ago. At first it was very complex and the learning curve was steep.
Then I got comfortable and even started creating some custom nodes of my own. Creating workflows, with production in mind. Having a great time.
Then came Subgraphs. Released absolutely prematurely. But thank god, we could just not use them.
Now we're facing multiple issues at once. My proven workflows don't run anymore, because VAE Decode keeps crapping out. Ok, found a fix, just add a clear VRAM in-between kSampler and VAE Decode and even ZIT runs.
But now we have very popular custom node packs endangered by updates. KJ-Nodes throws errors, Rgthree put's a note into the console, that there will not be time to fix the node pack for Nodes 2.0. Almost no node pack is compatible with Nodes 2.0.
OK, it's tagged BETA (only after push back) and we can skip using them. But how long until it's the standard?
"Quo Vadis" ComfyUI? Will you make a stable branch that wen can update for fixes, but no features added, and make an experimental branch, for the "I feel lucky today" folks?
Or will you keep destroying working setups by prematurely releasing mandatory "great new features"?
Sorry for this rant. Sorry if I am stepping toes here. But I have just spent three days to get back up and running and there's no end in sight. And now I can't even go back to i.e. 3.68, where everything worked, because of the security hole the ComfyUI Manager created...
I am aware that updates are needed for newer models to run (ZIT), but why do those updates make SDXL, FLux, VAE Decode (Tiled and regular) crash beyond repair? Why do we need Nodes 2.0, when the ecosystem is not ready for it? Fancy, or needed for stability, speed and reliability?
There has to be a better roadmap imho.
Love you. Love ComfyUI. Want to keep loving it. But right now, this is a tough call...
My only issue with the breakage is that when I updated one of my Stability Matrix ComfyUI installs it seemed like all of them were affected (openSUSE Linux, btw). Trying to open one of the non-updated ones yielded a frontend version error and possibly others. I think they all had to have a dependency updated (which I figured out based on the rather cryptic error report). At least the updated one did.
If there is a way to have multiple Linux installations that are 100% self-contained (not affecting one another) then I wouldn't mind things potentially breaking because I'd always update a duplicate of the one I'm using instead of the original. There may be a way to do this but I'm a bit of a novice. I thought that that's how Stability Matrix works but it's my impression that there is bleed-through (update one Comfy and affect all of them).
I was barely able to get Comfy working when it broke the last time I updated. I figured out that I could update one of the dependencies via Stability Matrix. It would have been one thing if an error report had simply stated "You need to update this particular dependency" but instead I had to use guesswork. That doesn't make me optimistic that I'd be able to fix all of the errors that might appear. So, rather than risk this again, I stopped updating, taking my main system for AI off-network. I then set up a second testing installation but I haven't yet had the chance to check out the latest versions of Comfy. I will see if I can use snapshots to roll everything back in the testing installation if something breaks. I'm new to SUSE, too. I know they work well for booting issues related to the kernel and Nvidia driver but I don't know how difficult it is to set up a system-wide snapshot with rollback.
I suppose my point with all of this is that it would be good for the Comfy devs to provide a written procedure to enable users to minimize the management overhead involved in updating Comfy — one that takes advantage of Stability Matrix since there is no portable Linux Comfy, and one that uses full-system snapshots so people can roll back everything easily. openSUSE seems to be a good platform because of its snapshots and the slower but hopefully more robust btrfs file system, although the Nvidia driver can be a pain to set up. I had to check something like seven websites to get all of the steps to get that working. It seems like maintaining this procedural FAQ would be less overhead for the devs, versus maintaining a separate "stable" code/release branch.
In ComfyUI a beta feature would be more of a public alpha in more typical release schedules.
I'm much more of an old school dev and I would consider subgraphs are finally beta now. They are mostly feature complete but are not quite stable enough for production. In modern dev, subgraphs are likely at an early release state since they are mostly feature complete but has """rough edges""".
Nodes v2 is definitely more suited to have been a separate preview branch.
Combining all of the random, mostly unpopular and often broken UI changes with the various backend updates to support new models was a misstep and is driving user resentment.
I personally have not have many of the backend issues others have seen, but I have been able to do the dev thing of throwing more vram at it.
Keeping compatibility with custom nodes has never been a priority in the entire ComfyUI history. My main ComfyUI development/test setup has zero custom nodes installed so that's the only setup I can guarantee is stable.
You should also make proper bug reports so we can fix issues instead of using whatever stupid workaround "vram clearing" node that probably breaks the built in memory management system and slows workflows down.
New versions has too many problems, although you are trying to or have fixed some. What we want is a stable comfyui,not a big and slow comfyui. I used to use wan2.2 720p very efficiently in 12G vram,now it is much slower,even wan2.1 takes twice the time to generate.In hunyuan1.5 even 81 frames can over 12G and then become very slow. @comfyanonymous
Hey, thanks @comfyanonymous for chiming in. Appreciate that this topic caught your attention.
stupid workaround "vram clearing"
You said that, not me. I am known for detailed and well documented bug reports. But this approach only works, when the bug is obvious and reproducible. But if a release just breaks everything and a new install fixes the problem, someone in the update department messed up and it's not a bug, it's a systemic problem.
Thing is, this VAE "workaround" has currently to be done in workflows using only ComfyUI core nodes on updated ComfyUI installs. It works fine on new installs (last I tested).
Maybe it would help to clearly define which version of Cuda, Pytorch Python works well with ComfyUI, hence a stable release, that only get's updated with new features, if those meet the set standard.
It is a fact that features get released into the release of ComfyUI, without proper testing at this point. Now that's aOK. You are all contributing in your spare time and probably don't get paid. I'm not asking you to babysit the ecosystem. I'm suggesting a stable branch would help everyone—including you. Users who need "it just works" can stay there, and the adventurous folks can ride the bleeding edge. Less noise in the issues, less frustrated users, more time for actual development.
Plus, talking about custom nodes, you know as well as everybody that ComfyUI's reach has expanded by people making nodes and contributing in that way. I get it—compatibility isn't your priority, and honestly, that's fair. You can't be expected to QA against every node pack out there. But here's the thing: when core changes break workflows that use only core nodes, that's where it gets messy.
And please forgive me. But when someone's spent three days firefighting just to get back to a working state, sometimes a frustrated rant on GitHub is what comes out first.
Love the project, love what you're building. Just want to see it thrive without burning out either the maintainers or the users.
I kind of feel this way regarding the frontend especially, the ability to cancel a workflow was removed for no apparent reason, now it's back which is great but you still can't cancel queued workflows from there. Sure you can do it from the new top panel but why not both?
I also feel like a push for stability could be good, maybe pinned memory works now but at first it felt pretty broken and I still use --disable-pinned-memory. And there are already a lot of open bug reports.
There are 3k issues open right now where most of them are pretty much ignored by maintainers except for the main labels. This makes it feel like there's a huge split between the open source community and the developers. Answering/closing these is probably not super hype but adding some kind of repro requirement would be useful and there are a lot of bug fixing opportunities still.
Same with pull requests, a lot of community engagement just wasted. I had some PRs in the front end repo and got a response after 5 months that basically said "we're switching to Vue nodes so closing this" which is cool I guess but that feature could have existed for 5 months or the PR could have been closed after a day. I don't really feel like contributing because of this. Ideally there could be more contributions from the community.
Keeping compatibility with custom nodes has never been a priority in the entire ComfyUI history. My main ComfyUI development/test setup has zero custom nodes installed so that's the only setup I can guarantee is stable.
You should also make proper bug reports so we can fix issues instead of using whatever stupid workaround "vram clearing" node that probably breaks the built in memory management system and slows workflows down.
This way of handling development of ComfyUI is fatal, for both, being a node developer, and a user. As a (tiny) node developer i am now forced to check nearly every day if my nodes are still functional. Next ComfyUI point upgrade can already break it again. I don't think that i will develop any new node soon.
And as a user i hate it when i upgrade Comfy and one of my workflows yet again quits working. I had just repaired it because the latest point upgrade did the same. My job is to make images and videos, not to battle with the software after each little update.
In both cases you produce unhappy bunnies with the current strategy.
I know of course that you are not responsible for the nodes made by others. But they rely at your work. And when you arbitrary change the rules, then they have no chance anymore. Permanently breaking the API with the next point upgrade is one of the worst behaviours that a software company can show. It should be highest priority to stay compatible in the minor versions. API breaking changes should only happen at major upgrades.
So please reconsider this concept.
--disable-pinned-memory --disable-async-offload --reserve-vram 1 much better,comfyui eats vram too much, so ajust the reserve-vram can help a bit
This way of handling development of ComfyUI is fatal, for both, being a node developer, and a user. As a (tiny) node developer i am now forced to check nearly every day if my nodes are still functional. Next ComfyUI point upgrade can already break it again. I don't think that i will develop any new node soon.
And this is very easy to solve with a very proven system of creating a stable branch and an experimental branch.
Common sense !!
I'm worried and worn out. On Friday, I finalize a workflow and take the precaution of doing the updates before publishing it. It's Monday and it no longer works already. I've wasted my time creating a workflow that's only usable for 3 days, not to mention those I have on my hard drive that are going to end up in the trash. That's the situation, and reason tells me it's time to move on to something else. That's my feeling, and I can't be the only one. Why not use snapshots, whether for the nodes, but especially for the ComfyUI version? The ComfyUI switch in the Manager doesn't offer enough versions.
Well, you can in fact go simply back a version and never update again.
Why not use snapshots
I prefer this idea which is why I suggested it in my previous message. This prevents the devs from having to slow down and manage two different code bases. If users can simply/easily select an earlier version in order to use a particular workflow/node that prevents disaster and a lot of frustration. Have these different versions be 100% self-contained, in that if anything is updated by the devs they are unaffected.
This would have to work on Linux, not just on Windows. A standard distro (like OpenSUSE) with instructions that are kept current would be best.
Sometimes some of the changes that are made to Comfy make me wonder, too. One of the recent commits was to change lora handling from fp32 to fp16 (to "save memory and increase speed"). I hope there is a simple way to revert that because I'd rather have the higher precision. I think it's important for the devs to avoid making broad assumptions about what people want out of the program. Always provide options. Some of us prefer greater precision and don't mind as much investing more into VRAM and RAM. Maybe I don't know enough about loras and there is no benefit at all to fp32 precision but I doubt that. My understanding has been that some loras can be fp32, in terms of how their developers create them.
Well, you can in fact go simply back a version and never update again.
That's a valid strategy until the latest groovy new model is released (Zit) or a really important bugfix happens (Chroma Radiance).
And it also is the end of a lot of great custom nodes, or the beginning of nightmarish times for node dev's...
Yeah :/
Let me check how much of my workflows has quit working this time ...