Yongqi Chen
Yongqi Chen
If you’re using our dataset, you need to set it to 20. This won’t affect the quality, but will generate videos at an optimal 77 frames.
> Thank you. I noticed that the VSA tuning scripts use different settings for latent size. Could you explain why the scripts set it to 16? > > [FastVideo/scripts/finetune/finetune_v1_VSA.sh](https://github.com/hao-ai-lab/FastVideo/blob/50da62e722165a8847895a551aa56bc5ee2bb08c/scripts/finetune/finetune_v1_VSA.sh#L27) >...
Training part is used [here](https://github.com/hao-ai-lab/FastVideo/blob/9ce7c8039e3fec4c632b8d29a2e41e418a9b56d6/fastvideo/training/training_pipeline.py#L613)? Validation part could be removed.
could you set `--validation_sampling_steps` to 50? This indicates the inference steps for validation
Did you mean pre-training?
It’s not supported yet, but we plan to add it in the future.
We just use a small portion(1-5%) of the training dataset as a validation set and compute the loss in the same way as the training. This has not yet been...
Could you try add ```--enable_gradient_checkpointing_type "full" ``` (May need to pull first to ensure it support gradient_checkpointing
It supports Wan2.1 T2V 14B. Currently, we haven't implemented it for Hunyuan yet.
For now, ```num_latent_t``` should be divisible by ```sp_size```. If you want to use sp=8, you may need to use 61 frames, which results in``` num_latent_t = 16```.