ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Please support for ChronoEdit

Open Jesssssss123 opened this issue 2 months ago • 12 comments

Feature Idea

https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers ChronoEdit-14B enables physics-aware image editing and action-conditioned world simulation through temporal reasoning. It distills priors from a 14B-parameter pretrained video generative model and separates inference into (i) a video reasoning stage for latent trajectory denoising, and (ii) an in-context editing stage for pruning trajectory tokens. ChronoEdit-14B was developed by NVIDIA as part of the ChronoEdit family of multimodal foundation models. This model is ready for commercial use.

Existing Solutions

No response

Other

No response

Jesssssss123 avatar Oct 31 '25 05:10 Jesssssss123

Feature Idea

https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers ChronoEdit-14B enables physics-aware image editing and action-conditioned world simulation through temporal reasoning. It distills priors from a 14B-parameter pretrained video generative model and separates inference into (i) a video reasoning stage for latent trajectory denoising, and (ii) an in-context editing stage for pruning trajectory tokens. ChronoEdit-14B was developed by NVIDIA as part of the ChronoEdit family of multimodal foundation models. This model is ready for commercial use.

Existing Solutions

No response

Other

No response

it's already working. I already tested it in Comfy. The only thing he needs is to somehow make the length 2 frames, since the standard requires at least 5, but for this model you only need 2.

I2v is the same as wan2.1 ,I2I not ,i also want to know how

zwukong avatar Oct 31 '25 08:10 zwukong

Model: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/chrono_edit_14B_fp16.safetensors

Workflow: Image

comfyanonymous avatar Oct 31 '25 23:10 comfyanonymous

Model: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/chrono_edit_14B_fp16.safetensors

Hi, there is a ScaleRope node, can't find it via custom manager, is it unpublished custom node?

rzgarespo avatar Nov 01 '25 00:11 rzgarespo

It is only available in git comfyui, it will be in stable next week.

comfyanonymous avatar Nov 01 '25 04:11 comfyanonymous

Model: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/chrono_edit_14B_fp16.safetensors

Workflow: Image

Is there fp8 model?

yamatazen avatar Nov 01 '25 05:11 yamatazen

Model: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/chrono_edit_14B_fp16.safetensors Workflow: Image

Is there fp8 model?

https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/tree/main/ChronoEdit

aliabougazia avatar Nov 01 '25 13:11 aliabougazia

Is it possible to visualize the editing trajectory? If I display all 5 images generated, apart from the first one which is the input original, the other 4 are the resulting image.

nistvan86 avatar Nov 05 '25 13:11 nistvan86

Is it possible to visualize the editing trajectory? If I display all 5 images generated, apart from the first one which is the input original, the other 4 are the resulting image.

I use the 'video combine node' save and to see how it chooses the last image. Sometimes, the images it skips are much better than the final result, especially if you use a WAN T2V LoRA with it. Some LoRAs affect the image in steps that is skipped in final output. I extract the frame I want and upscale it.

rzgarespo avatar Nov 07 '25 20:11 rzgarespo

@rzgarespo i don't think we are talking about the same thing. On the Nvidia presentation page of ChronoEdit they show that it's possible to recover the "thinking" process of the model to see how it came to the conclusion. Scroll down to the: "Temporal Reasoning Visualization" section. This is a video, which's first frame is the input and the last frame is the output.

nistvan86 avatar Nov 08 '25 09:11 nistvan86

@rzgarespo i don't think we are talking about the same thing. On the Nvidia presentation page of ChronoEdit they show that it's possible to recover the "thinking" process of the model to see how it came to the conclusion. Scroll down to the: "Temporal Reasoning Visualization" section. This is a video, which's first frame is the input and the last frame is the output.

Thank you for the link. That is exactly what the video combine node achieves. it makes a video clip out of images rendered for each step. So, at the end, there is one image as the final result, and an MP4 to see the thinking process. It's quite interesting, especially when a cat is involved like. prompt: "a cat sitting on the couch. " , Set the steps to 65 :)

rzgarespo avatar Nov 09 '25 03:11 rzgarespo

@rzgarespo then please share the workflow how you managed to render that video. As I said I only get a video with multiple repeated frames of the final image if I save the result of VAE Decode with Video Combine (from VHS nodes).

nistvan86 avatar Nov 09 '25 09:11 nistvan86