Yongqi Chen

https://brianchen1129.github.io/

UCSD San Diego BS@ZJU | MS @UMich | RA@UCSD

Results 18 comments of


                                            Yongqi Chen

[Bug] Latent size mismatch in distill script for Wan-Syn-Data-480P

If you’re using our dataset, you need to set it to 20. This won’t affect the quality, but will generate videos at an optimal 77 frames.

[Bug] Latent size mismatch in distill script for Wan-Syn-Data-480P

> Thank you. I noticed that the VSA tuning scripts use different settings for latent size. Could you explain why the scripts set it to 16? > > [FastVideo/scripts/finetune/finetune_v1_VSA.sh](https://github.com/hao-ai-lab/FastVideo/blob/50da62e722165a8847895a551aa56bc5ee2bb08c/scripts/finetune/finetune_v1_VSA.sh#L27) >...

Duplicated code in the training pipeline.

Training part is used [here](https://github.com/hao-ai-lab/FastVideo/blob/9ce7c8039e3fec4c632b8d29a2e41e418a9b56d6/fastvideo/training/training_pipeline.py#L613)? Validation part could be removed.

[Bug] fine-tuning wan2.1 all noise， but the generated videos during the validation phase remain unchanged regardless of the number of fine-tuning steps.

could you set `--validation_sampling_steps` to 50? This indicates the inference steps for validation

[Feature] Full Training for vsa

Did you mean pre-training?

[Feature] Full Training for vsa

It’s not supported yet, but we plan to add it in the future.

[Feature] how to calculate validation loss

We just use a small portion(1-5%) of the training dataset as a validation set and compute the loss in the same way as the training. This has not yet been...

[Bug] OOM when fine-tuning WAN 2.1-14B on 8x H200 (batch=1, V1 script)

Could you try add ```--enable_gradient_checkpointing_type "full" ``` (May need to pull first to ensure it support gradient_checkpointing

[Feature] Thanks for adding VSA support - question about compatibility with larger models

It supports Wan2.1 T2V 14B. Currently, we haven't implemented it for Hunyuan yet.

[Bug] num_latent_t causes shape mismatch

For now, ```num_latent_t``` should be divisible by ```sp_size```. If you want to use sp=8, you may need to use 61 frames, which results in``` num_latent_t = 16```.

1
2
›