Yongqi Chen

Results 18 comments of Yongqi Chen

If you’re using our dataset, you need to set it to 20. This won’t affect the quality, but will generate videos at an optimal 77 frames.

> Thank you. I noticed that the VSA tuning scripts use different settings for latent size. Could you explain why the scripts set it to 16? > > [FastVideo/scripts/finetune/finetune_v1_VSA.sh](https://github.com/hao-ai-lab/FastVideo/blob/50da62e722165a8847895a551aa56bc5ee2bb08c/scripts/finetune/finetune_v1_VSA.sh#L27) >...

Training part is used [here](https://github.com/hao-ai-lab/FastVideo/blob/9ce7c8039e3fec4c632b8d29a2e41e418a9b56d6/fastvideo/training/training_pipeline.py#L613)? Validation part could be removed.

Did you mean pre-training?

It’s not supported yet, but we plan to add it in the future.

We just use a small portion(1-5%) of the training dataset as a validation set and compute the loss in the same way as the training. This has not yet been...

Could you try add ```--enable_gradient_checkpointing_type "full" ``` (May need to pull first to ensure it support gradient_checkpointing

It supports Wan2.1 T2V 14B. Currently, we haven't implemented it for Hunyuan yet.

For now, ```num_latent_t``` should be divisible by ```sp_size```. If you want to use sp=8, you may need to use 61 frames, which results in``` num_latent_t = 16```.