LongCat-Video 🎉🐱
Just as a kid on xmas that squeezes the gifts ahead of time to guess whats inside, i had to give the new branch LongCat a test run ;-) Already works pretty well.. . Kijai been doing some magic again ;-)
Just a TEST run, and with half the steps (8 steps) than what is required, and at low resolution, since I was just testing, and didnt want to wait for too long to see if it worked. But still a nice result even if so ;-)
Image 2 Video:
https://github.com/user-attachments/assets/cd4feb45-9483-46e2-8066-399883d2017b
Text 2 Video:
https://github.com/user-attachments/assets/9c243e55-cbe5-4922-a05b-a202791b23c8
(using the "interactive" prompt example from their page - and my test workflow might contain errors, so take it with a grain of salt)
And for other curious cats, referencing this model: https://meituan-longcat.github.io/LongCat-Video/ https://github.com/meituan-longcat/LongCat-Video
Its still work in progress, and not in the main branch yet, but seems to work quite well already ;-) Thanks a lot Kijai.
Nice, yeah I think it's mostly working now. It refuses to work in fp16 which is a shame, fp8_scaled works okayish but bf16 is certainly better.
Some examples:
This is from 17 start frames:
https://github.com/user-attachments/assets/fbb75514-b53e-4447-a26f-2e0ef0fe31af
And this is I2V with 2 extensions:
https://github.com/user-attachments/assets/d0154850-0517-436c-948b-71a6ad44ce65
yes seems to work quite nice. My rusty 3090 might be too weak for the BF16, so my test was with the fp8. Will give the bf16 a try though ;-)
The colors seems quite stable in this model.
https://github.com/user-attachments/assets/4e68d0d1-3a81-41ed-8df4-1bc0d30b27f5
yes seems to work quite nice. My rusty 3090 might be too weak for the BF16, so my test was with the fp8. Will give the bf16 a try though ;-)
The colors seems quite stable in this model.
With block_swap the VRAM on 3090 is easily enough at least. And agree on colors, usually very natural.
generation time per video?
yes seems to work quite nice. My rusty 3090 might be too weak for the BF16, so my test was with the fp8. Will give the bf16 a try though ;-)
The colors seems quite stable in this model. WanVideoWrapper-LongCat_00004.1.mp4
Strange , used your and kj wf (didn't change anything) and it doesn't respect i2v and scene switching happens with jump cuts from gray color
https://github.com/user-attachments/assets/8f1aa5a1-0168-4d5c-96d5-f15fb2e0f406
https://github.com/user-attachments/assets/e42992a3-9e39-4058-b736-9ac672d6d45e
But I also saw similar thing with Holocine , could be my system then or sageattention
https://github.com/user-attachments/assets/0488e0bb-43e3-4eb2-b5d4-5b50fa2fd5c5
pytorch version: 2.8.0+cu129 Enabled fp16 accumulation. Set vram state to: NORMAL_VRAM Using sage attention Python version: 3.13.6 ComfyUI version: 0.3.64
or maybe pytorch 2.8.0 ?
@myprivacygithub use branch https://github.com/kijai/ComfyUI-WanVideoWrapper/tree/longcat
Enabled fp16 accumulation.
Could perhaps be worth trying to disable that. But not sure.
The rest of your settings looks pretty standard
I've had 2 confirmations so far that sage1 will break the image input, as when they switch to sdpa with same workflow it works. On sage 2.2.0 it has always worked for me.
Yeah it might have been sage1 , I had:
pytorch 2.8.0+cu129
sage 1.0.6
Python 3.13.6
Did a clean comfy install + updated nvidia drivers (had 576 > installed 581 to use cuda 13.0 ) now:
pytorch 2.9.0+cu130
sage 2.2.0+cu130torch2.9.0andhigher.post4
Python 3.12.10
no issues now , and probably Holocine works as well
https://github.com/user-attachments/assets/9bad59bf-624e-49af-86ed-fe7de041c863
Since it is not every time: Is it normal that it has weirdly many artifacts in action scenes? I made my workflow based on the workflow above and modified it slightly (only for easier handling, the functions are identical to the wf above). Slow videos works without problem - a flying bird was one time a ghost bird, one time ok. Two soldiers fighting are always desintegrating. Edit: Sage 2.2.0 - that is not the problem.
Since it is not every time: Is it normal that it has weirdly many artifacts in action scenes? I made my workflow based on the workflow above and modified it slightly (only for easier handling, the functions are identical to the wf above). Slow videos works without problem - a flying bird was one time a ghost bird, one time ok. Two soldiers fighting are always desintegrating. Edit: Sage 2.2.0 - that is not the problem.
How many steps are you running? The default for the distill lora is 16, anything under 10 in my experience degrades motion a lot.
Since it is not every time: Is it normal that it has weirdly many artifacts in action scenes? I made my workflow based on the workflow above and modified it slightly (only for easier handling, the functions are identical to the wf above). Slow videos works without problem - a flying bird was one time a ghost bird, one time ok. Two soldiers fighting are always desintegrating. Edit: Sage 2.2.0 - that is not the problem.
How many steps are you running? The default for the distill lora is 16, anything under 10 in my experience degrades motion a lot.
16 is WITH distill? Oh, wow. I‘ll try it. Going to edit this post in about 40 minutes. Edit: Much better (and much longer to generate), thanks!
Yeah 16 steps feels much better , so slow tho But still pretty amazing that you can gen this locally now with not much effort.
https://github.com/user-attachments/assets/f2987b27-2844-4d11-9cec-029118591275
https://github.com/user-attachments/assets/0affc618-b50b-4b3a-a088-41c799017dff
yeah we are spoiled with lightX loras and low steps ;-) It feels a little "slow" in comparison, even if 16 steps is really nothing, we gotten used to 4-6 steps hehe. But maybe if this model takes off, or its architecture, some low step trick comes around ;-)
And agree, the model is really really nice
Is full LoRA support planned? I guess there is nothing much about it, despite few extra blocks.
Is full LoRA support planned? I guess there is nothing much about it, despite few extra blocks.
LoRAs already work though? If you mean using Wan LoRAs... that can't work as it's new foundational model, it needs it's own LoRAs.
Tested the refine LoRA a bit after clumsy and tedious conversion, not 100% sure I got it right but it does seem to work, as without the LoRA same settings don't really change the output.
https://github.com/user-attachments/assets/f90e7225-4148-4683-b68e-f9560e3d259b
https://github.com/user-attachments/assets/3f75f6a7-f10c-44e7-8dd6-6df35cc03e68
LoRAs already work though? If you mean using Wan LoRAs... that can't work as it's new foundational model, it needs it's own LoRAs.
Yes, I meant WAN LoRAs, since they are already available, wanted to try them out but it fails. AIs suggest there is a way to load them but I can't do code tweaks on a current potato PC I have.
Something about replacement of keys - seems like they are the same in general, just using different aliases for them.
LoRAs already work though? If you mean using Wan LoRAs... that can't work as it's new foundational model, it needs it's own LoRAs.
Yes, I meant WAN LoRAs, since they are already available, wanted to try them out but it fails. AIs suggest there is a way to load them but I can't do code tweaks on a current potato PC I have.
Something about replacement of keys - seems like they are the same in general, just using different aliases for them.
It fails because there's no Wan model that uses same dimension even, and it wouldn't work even if it matched as this model isn't trained from Wan, it's a new model. Similarly as you can't use 1.3B loras with 14B etc.
It fails because there's no Wan model that uses same dimension even, and it wouldn't work even if it matched as this model isn't trained from Wan, it's a new model.
That's pretty sad. Well, all hope on enthusiasts, then.
I never really used much lora's, except for LightX (low step), and reward loras. And refine loras. Seems like Kijai made a refine one.
If its a style lora I guess you could instead style the input image first frame (for now)
There were some camera control ones that I also used a bit but not a lot
Seems like Kijai made a refine one.
Nah they actually released this model with distill and refine LoRAs.
Looking at what it does is just passing motion frames from one part to another , so if subject it out of frame it might not return in same shape :D Also need to try bf16 , maybe that one will create zombies from snow better
https://github.com/user-attachments/assets/22819db0-7740-46dc-a409-efc533e0ac4d
Looking at what it does is just passing motion frames from one part to another , so if subject it out of frame it might not return in same shape :D Also need to try bf16 , maybe that one will create zombies from snow better
WanVideoWrapper-LongCat_00014.mp4
No, for character consistency, you need svi shot or holocine. They are different and use a reference frame (svi) or cache (holocine).
After some tests with just WAN2.2 but using longcat-distill-euler sampler - I can say we need that on main branch. No context options, no character consistency LoRAs applied - but the character remains pretty much consistent and the scene doesn't look burnt or else. In my example, it remembered the tattoos, the golden watches, the outfit, all the accessories exactly like they were drawn in the first section - no distortions. Maybe something else is helping me but I didn't get these same results with other samplers after thousands of tries.
One nice use-case of this model, is simply extending videos you have already. One time or one by one to generate longer and longer (where you focus the prompting one extension at a time). Instead of generating one long video in one go..
Since the model seems so stable, it seems to work nicely for extending videos as you go..
Not really anything special other than just a variation of same workflow as used above, skipping the first step of generation, and jumping to extending part with a video as input instead of image... But in case someone might find it interesting:
https://github.com/user-attachments/assets/89004e44-4596-463c-8704-6fe202ca3a0d
Extending Kijais motorbike ride from above with a few more frames - a daredevil with no hands on the wheel ;-)
Ah yes , more Instagram slop will be created :D Some stuff is hard to control still , like camera pause or damage/explosion shape
https://github.com/user-attachments/assets/308334ae-21ce-480e-a345-c6679d22253d
Ah yes , more Instagram slop will be created :D Some stuff is hard to control still , like camera pause or damage/explosion shape
Looks pretty good though ;-)
workflow