PVDM
PVDM copied to clipboard
Official PyTorch implementation of Video Probabilistic Diffusion Models in Projected Latent Space (CVPR 2023).
Q1: As latent diffusion uses VAE, why did you modify the structure to autoencoder, is it because of poor VAE performance? Q2: Why design a bottleneck structure here? https://github.com/sihyun-yu/PVDM/blob/17699659148423469c0d1ccdca5e466933b943e1/models/autoencoder/autoencoder_vit.py#L180C1-L190C34
The repo as a few hardcoded things that makes it difficult to use with a different setting, like different resolution or timesteps. I think I managed the resolution problem also...
Maybe there is a memory non reclamation issue in the first_stage_train, resulting in gradual memory growth 
such as:Normalize,BasicTransformerBlock,convert_module_to_f16,etc
Excellent work! : ) But I got a bug. When I use the multi-GPU run the first_stage code, my code was block up at this line. I find the issue...
Excuse me ~ How can we do the inference with the checkpoints ?
For training the autoencoder. Is it really possible to train the autoencoder with a 7-8 batch size? As you mentioned in the paper, how can we train the autoencoder with...
Thanks for open-sourcing the great work. However, I tried training the VAE on SkyTimelapse dataset for 150K steps but the R-FVD only get 66.79 while the reported number in the...