Suggestions and questions on the API for integration into realtime applications (Touchdesigner, UnrealEngine, Unity, Resolume etc.)
I just discovered comfyui and am loving it so far. Congratulations and thank you!
I'm looking to incorporate it into a custom realtime workflow and I'd like to humbly present a few suggestions & questions (which would also make integration a lot more powerful with other software like unrealengine, touchdesigner, unity etc)
Suggestions
- Add metadata (e.g. server version number and api.json format version number) to the top of workflow_api.json, so in the future if the format changes, client code won't just stop working, but we can check the version and act accordingly if need be. E.g. if we submit an out of date api.json to an upgraded server, instead of just crashing with a difficult to parse error, it could simply log that the version of the api.json submitted doesn't match the version of the server. Likewise with custom client apps loading api.json files.
- Similarly, it would be good to have metadata for custom nodes in the api.json. E.g.
- version number (and/or commit hash and timestamp)
- package name
- Related to the above, it would be good to encourage
orgname.extensionname.classnamenaming for custom nodes, so that unique class names are garanteed, and are also human readable to some degree. E.g. there are currently class names such asImage Blend,ImageBlendandBlendImagefrom different extensions. This is going to get very messy very quickly! - Also save the
Titleof the node in the api.json. This way if we know the title of a node, we can find it in the api.json via code, instead of manually looking for the index (which can break if we make modifications to the graph) - Related to the above, many node-graph based apps enforce unique node titles (E.g. Blender, Houdini, Touchdesigner) by appending an increasing number (e.g.
Save Image 1,Save Image 2,Save Image 3). This would make pt 4 easier. I'm not suggesting using the node title as the dict key (as that would unnessecarily break backwards compatibility). The dict key can remain the numerical index, but if the title were at least a (unique) field, we could at least quickly remap the dict by title on client side. - I find it a bit confusing that we are submitting entire graphs to the
/promptendpoint, with a data structure that is{prompts:{entire graph}}. From a terminology standpoint, we're not really submitting a prompt, but an entire graph. And we're submitting it to a queue to be processed. Does it make sense to instead send{graph:{entiregraph}}to the/queueendpoint (mirroring the webui)? Currently, the example api scripts - both on this repo, and in the wild, because others are adopting the same terminology - are confusing to parse, because there are variables calledpromptwhich are referring to the actual text prompt, and other variables calledpromptwhich are referring to the entire graph structure in the API json format. - Nodes which are disabled (bypassed) are not saved in the api.json. However it would be useful to have bypassed nodes in the api.json, marked as bypassed or disabled (and the other nodes marked as enabled). This way we can programatically enable and disable nodes without having to i) unbypass everything in the webui before saving the api file, and then ii) manually rewiring inputs in client code to emulate bypassing.
- It would be great to be able to request a api.json of the current graph in the comfyui webui directly via an api call without manually saving it to a file. This would allow us to play in comfyui webui, and make changes to the graph in realtime, and another host app can control some of the parameters.
- (This is perhaps the most complex and could be an issue by itself). It would be amazing to have spout (opensource, Windows only) support for realtime Image and Latent vector sharing between comfyui and other apps (such as UnrealEngine, Touchdesigner, Unity, Blender, Resolume and many others) directly via realtime GPU texture sharing if the server and client are running on the same machine. (Since there's no GPU support on Mac, syphon, the Mac equivalent is probably unnessecary). I imagine this would never make it into the core as it's so specific, but if at least the infrastructure was there in the core that would allow this to be possible via extensions, that would be awesome. E.g. So instead of pulling a texture off the GPU on a client app, saving it to a file and uploading it to the backend, we can update image and/or latent vector inputs of nodes directly via GPU texture sharing, and then send a queue request, and receive images and/or latent vectors directly in our host application without pulling them off the GPU, saving to disk and loading.
Questions
- It seems there's a really nice optimization in the graph execution via the webui in that only nodes downstream of any changes are executed. Does this also happen when we submit graphs via the api?
- When I submit a graph via the api i can see in the console that it does execute, but the webui does not update. Is it possible to force the webui to display the graph as well? (refreshing the browser doesn't work for me)
- I saw in the websockets example that status info is sent to the client. Can the client send info via sockets as well?
Thank you again for the wonderful application and I'm looking forward to seeing it develop!
Spout support would be amazing~!
In https://github.com/rvion/CushyStudio, I found ways around pretty much everything you mentioned (and way more) by redoing the whole frontend, and beeing in control of more things. it also feature a higher level type-safe typescript ComfyUI SDK to build workflows with all possible goodies you can imagine, and also feature a scripting engine with hot reloading to build custom app with UI on top of the SDK. Embedded usages was a design constraint I have. project is already really big, yet not 100% mature due to its scope. Only reason it's not popular yet is that it's not released yet (no communication) and have a few quirks here and there. but maybe you'll have fun trying it out. feel free to reach out on discord if you need help:)
Regarding 1. client code could just check the version of the comfy server itself instead of adding it to the workflow.
- Would blow up the API format. Instead there's an endpoint for the node info (don't know from the top of my head) - if the version info is not already in there, I think it should rather go there.
- I think it's already solved by now
Question Nr. 3: That I find very interesting. I want to accomplish exactly that, but so far ran into problems. It is possible for a custom node to access the Prompt server.instance which has the sockets property. That could allow to listen for incoming messages. There's nothing in the existing Server code that does that now from what I can tell. However the package used by the server (aiohttp) poses limitations regarding which thread is allowed to receive messages, so that might be a problem. I will continue experimenting next week. Here's a related discussion: https://github.com/comfyanonymous/ComfyUI/discussions/3424