ComfyUI-WanVideoWrapper icon indicating copy to clipboard operation
ComfyUI-WanVideoWrapper copied to clipboard

New Image Edit Model based on Wan

Open railep opened this issue 4 months ago • 15 comments

Nvidia published https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers, an image edit model like qwen edit. Basically, it‘s Wan 2.1 I2V with either 2 or 23 frames and the last frame is the edited image. I think, there is no modification needed, but maybe a example workflow as long as the model has correct layer names.

railep avatar Oct 30 '25 22:10 railep

oh I saw this earlier. Thought it was an image model. Might be worth a try see if it works for a few frames ;-)

RuneGjerde avatar Oct 30 '25 23:10 RuneGjerde

It looks really nice, no image degradation. This is a big plus. I will install later and play with it a little bit.

snicolast avatar Oct 30 '25 23:10 snicolast

Seems to work out of the box after converting the model from diffusers, though I have no idea if this is the correct way to use it:

Image

kijai avatar Oct 31 '25 00:10 kijai

@kijai could you try feeding the output back into the input several times with different tweaks, and check whether any degradation actually occurs?

snicolast avatar Oct 31 '25 00:10 snicolast

Does changing scene work https://research.nvidia.com/labs/toronto-ai/chronoedit/assets/video_examples/21.mp4

and camera control: https://research.nvidia.com/labs/toronto-ai/chronoedit/assets/video_examples/9.mp4

thought those 2 use cases looked pretty good. But i'm sure its cherry picked as well ;-)

Speaking of camera control i saw some candy on Kijais huggingface as well ;-)

RuneGjerde avatar Oct 31 '25 00:10 RuneGjerde

Don't have time right now to test, but I uploaded the converted models:

fp16 and the distill lora:

https://huggingface.co/Kijai/WanVideo_comfy/blob/main/ChronoEdit/

fp8_scaled:

https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/tree/main/ChronoEdit

kijai avatar Oct 31 '25 00:10 kijai

Don't have time right now to test, but I uploaded the converted models:

oh nice.. will give it a try ;-)

RuneGjerde avatar Oct 31 '25 00:10 RuneGjerde

Cool, thanks, man.

snicolast avatar Oct 31 '25 00:10 snicolast

Think it might work as advertised ;-) could somewhat reproduce some of their examples. But only did a few tests

RuneGjerde avatar Oct 31 '25 02:10 RuneGjerde

It looks really nice, no image degradation. This is a big plus. I will install later and play with it a little bit.

Got me curious, even though its an image model... Not sure if its the best way or not, but did try a long run with regular context window. Seems to hold up pretty well ... could be some vace ish workflow a better test.

https://github.com/user-attachments/assets/65282587-05c1-4278-96f9-67ba99981c3e

(the only thing i noticed was that the movements seems a bit rapid, but only did a few tests)

RuneGjerde avatar Oct 31 '25 03:10 RuneGjerde

Seems to work out of the box after converting the model from diffusers, though I have no idea if this is the correct way to use it:

Image

Is there a way to generate only 2 frames? Since frames 2-5 are practically no different, but they take time

Is there a way to generate only 2 frames? Since frames 2-5 are practically no different, but they take time

Since two is the minimum for I2V, it should be no problem. Just set the frame count to 2 instead of 5.

railep avatar Oct 31 '25 06:10 railep

Is there a way to generate only 2 frames? Since frames 2-5 are practically no different, but they take time

Since two is the minimum for I2V, it should be no problem. Just set the frame count to 2 instead of 5.

Is this possible with kijai nodes? The last time I tried to install this it didn't support my video card (2000 series). And the comfyui base node "WanImageToVideo" only supports 1 or 5 frames. If kijai node supports 2 frames, I'll install it, thanks for the help.

@NarutoHokageSaskeUchihaSuperItachiMan

WanVideo ImageToVideo is same as the native one in regards to number of frames.. 1, 5, 9, etc...

but i think you need 5.. sometimes even a few more, if thats what it takes for the Wan to do the change needed. Say for example make a person turn around for a different camera view. Didnt try myself, but saw some Youtube videos testing the models, say that

That being said, on a 2000 gpu, you could try the much smaller GGUF version see if that works better for you https://huggingface.co/QuantStack/ChronoEdit-14B-GGUF
(for example Q4 is pretty good, and all above are good )

(and make sure to use the block swap node, to adjust the use between Vram and regular Ram)

RuneGjerde avatar Nov 04 '25 19:11 RuneGjerde

If I remember correctly the latent in Wan is in 4 frames "blocks" so to speak. I was playing a while ago with the S2V model where is burning the first frames and you have to do the workaround to repeat the first latent block and cut the frames corresponding to it after the vae decoding. I was curious to see what is the output if I decode only that first block. The result was a 4 frames movie (+ 1 empty frame that made VirtuaDub to act strange). So the model can't do less than that. Perhaps can do 1 frame (meaning it's outputting the same image set as the start image) but the next number available is 1+4.

jovan2009 avatar Nov 05 '25 00:11 jovan2009