Hyeonho, Jeong
Hyeonho, Jeong
@Zebraslive I am actually working on that and will share the open source code when it's done.
UNet3DConditionModel from _showone/modes/unet_3d_condition.py_ is used in "TextToVideoIFPipeline", which is locally defined in _showone/pipelines/pipeline_t2v_base_pixel.py_, which is also the initial keyframe generation model of the Show-1 cascaded vdm. Or did I understand...
This is not something included in the original paper right? Can you explain about this modification, please? Thank you.
Thank you for clarifying it. I will check out the Dreambooth paper. I have one more question. I think "train_temporal_conv" was not in the original Tune A Video paper. Is...
Hi, @gigoug - For **skiing**, the prompts used are these: - training prompt: "a man is snowboarding" - target prompts: "a spider-man is snowboarding", "an astronaut is snowboarding underwater in...
Hi @CJ416 , Inference of a whole cascaded video diffusion pipeline takes some time. In the case of training with a single video of 8 frames, training should be done...