ComfyUI-WanVideoWrapper icon indicating copy to clipboard operation
ComfyUI-WanVideoWrapper copied to clipboard

LongCat-Video 🎉🐱

Open RuneGjerde opened this issue 1 month ago • 84 comments

Just as a kid on xmas that squeezes the gifts ahead of time to guess whats inside, i had to give the new branch LongCat a test run ;-) Already works pretty well.. . Kijai been doing some magic again ;-)

Just a TEST run, and with half the steps (8 steps) than what is required, and at low resolution, since I was just testing, and didnt want to wait for too long to see if it worked. But still a nice result even if so ;-)

Image 2 Video:

https://github.com/user-attachments/assets/cd4feb45-9483-46e2-8066-399883d2017b

Text 2 Video:

https://github.com/user-attachments/assets/9c243e55-cbe5-4922-a05b-a202791b23c8

(using the "interactive" prompt example from their page - and my test workflow might contain errors, so take it with a grain of salt)


And for other curious cats, referencing this model: https://meituan-longcat.github.io/LongCat-Video/ https://github.com/meituan-longcat/LongCat-Video

Its still work in progress, and not in the main branch yet, but seems to work quite well already ;-) Thanks a lot Kijai.

RuneGjerde avatar Oct 27 '25 00:10 RuneGjerde

Nice, yeah I think it's mostly working now. It refuses to work in fp16 which is a shame, fp8_scaled works okayish but bf16 is certainly better.

Some examples:

This is from 17 start frames:

https://github.com/user-attachments/assets/fbb75514-b53e-4447-a26f-2e0ef0fe31af

And this is I2V with 2 extensions:

https://github.com/user-attachments/assets/d0154850-0517-436c-948b-71a6ad44ce65

kijai avatar Oct 27 '25 00:10 kijai

yes seems to work quite nice. My rusty 3090 might be too weak for the BF16, so my test was with the fp8. Will give the bf16 a try though ;-)

The colors seems quite stable in this model.

https://github.com/user-attachments/assets/4e68d0d1-3a81-41ed-8df4-1bc0d30b27f5

RuneGjerde avatar Oct 27 '25 00:10 RuneGjerde

yes seems to work quite nice. My rusty 3090 might be too weak for the BF16, so my test was with the fp8. Will give the bf16 a try though ;-)

The colors seems quite stable in this model.

With block_swap the VRAM on 3090 is easily enough at least. And agree on colors, usually very natural.

kijai avatar Oct 27 '25 00:10 kijai

generation time per video?

Rudra-ai-coder avatar Oct 27 '25 05:10 Rudra-ai-coder

Image

workflow

wan_longcat.json

myprivacygithub avatar Oct 27 '25 05:10 myprivacygithub

yes seems to work quite nice. My rusty 3090 might be too weak for the BF16, so my test was with the fp8. Will give the bf16 a try though ;-)

The colors seems quite stable in this model. WanVideoWrapper-LongCat_00004.1.mp4

Strange , used your and kj wf (didn't change anything) and it doesn't respect i2v and scene switching happens with jump cuts from gray color

https://github.com/user-attachments/assets/8f1aa5a1-0168-4d5c-96d5-f15fb2e0f406

https://github.com/user-attachments/assets/e42992a3-9e39-4058-b736-9ac672d6d45e

But I also saw similar thing with Holocine , could be my system then or sageattention

https://github.com/user-attachments/assets/0488e0bb-43e3-4eb2-b5d4-5b50fa2fd5c5

pytorch version: 2.8.0+cu129 Enabled fp16 accumulation. Set vram state to: NORMAL_VRAM Using sage attention Python version: 3.13.6 ComfyUI version: 0.3.64

or maybe pytorch 2.8.0 ?

siraxe avatar Oct 27 '25 12:10 siraxe

@myprivacygithub use branch https://github.com/kijai/ComfyUI-WanVideoWrapper/tree/longcat

siraxe avatar Oct 27 '25 12:10 siraxe

Enabled fp16 accumulation.

Could perhaps be worth trying to disable that. But not sure.

The rest of your settings looks pretty standard

RuneGjerde avatar Oct 27 '25 14:10 RuneGjerde

I've had 2 confirmations so far that sage1 will break the image input, as when they switch to sdpa with same workflow it works. On sage 2.2.0 it has always worked for me.

kijai avatar Oct 27 '25 14:10 kijai

Image workflow

wan_longcat.json

the same

orzechowy3334-rgb avatar Oct 27 '25 16:10 orzechowy3334-rgb

Yeah it might have been sage1 , I had:

pytorch 2.8.0+cu129
sage 1.0.6
Python 3.13.6

Did a clean comfy install + updated nvidia drivers (had 576 > installed 581 to use cuda 13.0 ) now:

pytorch 2.9.0+cu130
sage 2.2.0+cu130torch2.9.0andhigher.post4
Python 3.12.10

no issues now , and probably Holocine works as well

https://github.com/user-attachments/assets/9bad59bf-624e-49af-86ed-fe7de041c863

siraxe avatar Oct 27 '25 16:10 siraxe

Since it is not every time: Is it normal that it has weirdly many artifacts in action scenes? I made my workflow based on the workflow above and modified it slightly (only for easier handling, the functions are identical to the wf above). Slow videos works without problem - a flying bird was one time a ghost bird, one time ok. Two soldiers fighting are always desintegrating. Edit: Sage 2.2.0 - that is not the problem.

railep avatar Oct 27 '25 17:10 railep

Since it is not every time: Is it normal that it has weirdly many artifacts in action scenes? I made my workflow based on the workflow above and modified it slightly (only for easier handling, the functions are identical to the wf above). Slow videos works without problem - a flying bird was one time a ghost bird, one time ok. Two soldiers fighting are always desintegrating. Edit: Sage 2.2.0 - that is not the problem.

How many steps are you running? The default for the distill lora is 16, anything under 10 in my experience degrades motion a lot.

kijai avatar Oct 27 '25 17:10 kijai

Since it is not every time: Is it normal that it has weirdly many artifacts in action scenes? I made my workflow based on the workflow above and modified it slightly (only for easier handling, the functions are identical to the wf above). Slow videos works without problem - a flying bird was one time a ghost bird, one time ok. Two soldiers fighting are always desintegrating. Edit: Sage 2.2.0 - that is not the problem.

How many steps are you running? The default for the distill lora is 16, anything under 10 in my experience degrades motion a lot.

16 is WITH distill? Oh, wow. I‘ll try it. Going to edit this post in about 40 minutes. Edit: Much better (and much longer to generate), thanks!

railep avatar Oct 27 '25 17:10 railep

Yeah 16 steps feels much better , so slow tho But still pretty amazing that you can gen this locally now with not much effort.

https://github.com/user-attachments/assets/f2987b27-2844-4d11-9cec-029118591275

https://github.com/user-attachments/assets/0affc618-b50b-4b3a-a088-41c799017dff

siraxe avatar Oct 27 '25 20:10 siraxe

yeah we are spoiled with lightX loras and low steps ;-) It feels a little "slow" in comparison, even if 16 steps is really nothing, we gotten used to 4-6 steps hehe. But maybe if this model takes off, or its architecture, some low step trick comes around ;-)

And agree, the model is really really nice

RuneGjerde avatar Oct 27 '25 20:10 RuneGjerde

Is full LoRA support planned? I guess there is nothing much about it, despite few extra blocks.

aimfordeb avatar Oct 27 '25 21:10 aimfordeb

Is full LoRA support planned? I guess there is nothing much about it, despite few extra blocks.

LoRAs already work though? If you mean using Wan LoRAs... that can't work as it's new foundational model, it needs it's own LoRAs.

kijai avatar Oct 27 '25 21:10 kijai

Tested the refine LoRA a bit after clumsy and tedious conversion, not 100% sure I got it right but it does seem to work, as without the LoRA same settings don't really change the output.

https://github.com/user-attachments/assets/f90e7225-4148-4683-b68e-f9560e3d259b

https://github.com/user-attachments/assets/3f75f6a7-f10c-44e7-8dd6-6df35cc03e68

kijai avatar Oct 27 '25 21:10 kijai

LoRAs already work though? If you mean using Wan LoRAs... that can't work as it's new foundational model, it needs it's own LoRAs.

Yes, I meant WAN LoRAs, since they are already available, wanted to try them out but it fails. AIs suggest there is a way to load them but I can't do code tweaks on a current potato PC I have.

Something about replacement of keys - seems like they are the same in general, just using different aliases for them.

aimfordeb avatar Oct 27 '25 21:10 aimfordeb

LoRAs already work though? If you mean using Wan LoRAs... that can't work as it's new foundational model, it needs it's own LoRAs.

Yes, I meant WAN LoRAs, since they are already available, wanted to try them out but it fails. AIs suggest there is a way to load them but I can't do code tweaks on a current potato PC I have.

Something about replacement of keys - seems like they are the same in general, just using different aliases for them.

It fails because there's no Wan model that uses same dimension even, and it wouldn't work even if it matched as this model isn't trained from Wan, it's a new model. Similarly as you can't use 1.3B loras with 14B etc.

kijai avatar Oct 27 '25 21:10 kijai

It fails because there's no Wan model that uses same dimension even, and it wouldn't work even if it matched as this model isn't trained from Wan, it's a new model.

That's pretty sad. Well, all hope on enthusiasts, then.

aimfordeb avatar Oct 27 '25 21:10 aimfordeb

I never really used much lora's, except for LightX (low step), and reward loras. And refine loras. Seems like Kijai made a refine one.

If its a style lora I guess you could instead style the input image first frame (for now)

There were some camera control ones that I also used a bit but not a lot

RuneGjerde avatar Oct 27 '25 21:10 RuneGjerde

Seems like Kijai made a refine one.

Nah they actually released this model with distill and refine LoRAs.

kijai avatar Oct 27 '25 22:10 kijai

Looking at what it does is just passing motion frames from one part to another , so if subject it out of frame it might not return in same shape :D Also need to try bf16 , maybe that one will create zombies from snow better

https://github.com/user-attachments/assets/22819db0-7740-46dc-a409-efc533e0ac4d

siraxe avatar Oct 27 '25 22:10 siraxe

Looking at what it does is just passing motion frames from one part to another , so if subject it out of frame it might not return in same shape :D Also need to try bf16 , maybe that one will create zombies from snow better

WanVideoWrapper-LongCat_00014.mp4

No, for character consistency, you need svi shot or holocine. They are different and use a reference frame (svi) or cache (holocine).

railep avatar Oct 27 '25 22:10 railep

After some tests with just WAN2.2 but using longcat-distill-euler sampler - I can say we need that on main branch. No context options, no character consistency LoRAs applied - but the character remains pretty much consistent and the scene doesn't look burnt or else. In my example, it remembered the tattoos, the golden watches, the outfit, all the accessories exactly like they were drawn in the first section - no distortions. Maybe something else is helping me but I didn't get these same results with other samplers after thousands of tries.

aimfordeb avatar Oct 27 '25 22:10 aimfordeb

One nice use-case of this model, is simply extending videos you have already. One time or one by one to generate longer and longer (where you focus the prompting one extension at a time). Instead of generating one long video in one go..

Since the model seems so stable, it seems to work nicely for extending videos as you go..

Not really anything special other than just a variation of same workflow as used above, skipping the first step of generation, and jumping to extending part with a video as input instead of image... But in case someone might find it interesting:

https://github.com/user-attachments/assets/89004e44-4596-463c-8704-6fe202ca3a0d

Extending Kijais motorbike ride from above with a few more frames - a daredevil with no hands on the wheel ;-)

WanVideoWrapper - LongCat Extend-a-Video.json

RuneGjerde avatar Oct 27 '25 23:10 RuneGjerde

Ah yes , more Instagram slop will be created :D Some stuff is hard to control still , like camera pause or damage/explosion shape

https://github.com/user-attachments/assets/308334ae-21ce-480e-a345-c6679d22253d

siraxe avatar Oct 28 '25 02:10 siraxe

Ah yes , more Instagram slop will be created :D Some stuff is hard to control still , like camera pause or damage/explosion shape

Looks pretty good though ;-)

RuneGjerde avatar Oct 28 '25 02:10 RuneGjerde