huyduong7101 issues

Results 5 issues of


                                            huyduong7101

Attention in Motion Module of UNetMotionModel

### Describe the bug As far as I know, UNetMotionModel is adopted for AnimateDiff. Hence, I look into the original implementation of AnimateDiff, it is noticed that they use cross-attention...

bug

The confusion between pretrained AutoencoderKL version

May I ask why the authors adopt pretrained weight VAE from https://huggingface.co/stabilityai/sd-vae-ft-mse, instead of that from https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/vae?

About the future works

In the scope of human-related video generation, there are two main and emergent problems, namely, Talking Face Generation (TFG) and Human Animation Generation (HAG). The discrepancy between those problems is...

About the relationship between Whisper vs pretrained UNet SDv1.4

In this work, the author adopted Whisper-tiny (d_model=384) to extract audio feature, while training UNet from scratch. I guess the reason behind training from scratch instead of loading pretrained SDv1.4...

About "bbox shift" technique in training

"Bbox shift" has a significant impact on the output. Hence, does anyone try to use "bbox shift" as augmentation in training?