PVDM issues

Question about the autoencoder design

Q1: As latent diffusion uses VAE, why did you modify the structure to autoencoder, is it because of poor VAE performance? Q2: Why design a bottleneck structure here? https://github.com/sihyun-yu/PVDM/blob/17699659148423469c0d1ccdca5e466933b943e1/models/autoencoder/autoencoder_vit.py#L180C1-L190C34

Darius-H

Code can't adapt to different number of timesteps

1

The repo as a few hardcoded things that makes it difficult to use with a different setting, like different resolution or timesteps. I think I managed the resolution problem also...

PallottaEnrico

Increasing memory

2

Maybe there is a memory non reclamation issue in the first_stage_train, resulting in gradual memory growth ![image](https://user-images.githubusercontent.com/21325945/229014533-411e69d6-166c-4231-891d-b55479bea018.png)

fhlt

andreYoo

Can not reproduce VAE R-FVD metrics on the SkyTimelapse dataset

3

Thanks for open-sourcing the great work. However, I tried training the VAE on SkyTimelapse dataset for 150K steps but the R-FVD only get 66.79 while the reported number in the...

BPAWD

PVDM
PVDM copied to clipboard

Metadata

Question about the autoencoder design

Code can't adapt to different number of timesteps

Increasing memory

unet file has many errors.some blocks and functions are not define.

torch.nn.parallel.DistributedDataParallel

about inference

Batch size

Can not reproduce VAE R-FVD metrics on the SkyTimelapse dataset

← Metadata

Owner

Metadata

PVDM PVDM copied to clipboard

Metadata

← Metadata

Owner

Metadata

PVDM
PVDM copied to clipboard