ru-dalle icon indicating copy to clipboard operation
ru-dalle copied to clipboard

missing VAE encoder with DWT

Open ink1 opened this issue 2 years ago • 2 comments

@shonenkov Great work everyone! As far as I can tell, there is only VAE decoder with DWT and no corresponding encoder. Encoding with get_vae(dwt=True) produces the same number of tokens as get_vae(dwt=False) on the same picture size but they are different. And the DWT decoder doubles the original image size. The result is large but blurry and I see quality loss even after reducing to the original image size. The image decoded-encoded with default VQ GAN model still seems to be better than the DWT model. @bes-dev Is this due to the need of re-training the model end to end you mentioned in #42 ? I would expect the compatible VAE DWT encoder encode 512x512 image into 1024 tokens and the decoder restore the image back to 512x512. I think for now VAE with DWT needs 256x256 image prompts rather than 512x512 but then the resulting quality is unfortunately not worth the effort. Looking forward to see DALL-E trained end-to-end on 512 images.

ink1 avatar Dec 05 '21 01:12 ink1

@ink1 yes, the available checkpoint of the DWT VQVAE was trained only for a few iterations and a small dataset as a proof of concept, but to achieve production quality, we should train it longer with a larger dataset. At the moment, I don't have enough resources to do it, but I think Sber guys will do it on their side.

bes-dev avatar Dec 05 '21 10:12 bes-dev

@ink1 Same thing @bes-dev and I were talking about over here: https://github.com/bes-dev/vqvae_dwt_distiller.pytorch/issues/1

Awaiting the retraining here as well.

RyPoints avatar Dec 31 '21 20:12 RyPoints