transformer_latent_diffusion Low image quality even after 300k steps?

Hello,

first - thank you for sharing this repo, I've been looking for something like this for a long time. I run your code for 300k steps, but the quality is still somewhat different from what you shared at step 250k (see the attached picture from step 300k). I am using the GRIT dataset, all 20 parquet files, preprocessed with your script. I run with the same params you shared, but the loss kinda stagnates. I use 12 layers transformer, embed_dim 768, batch size 256 and lr 3e-4, the rest are the default params you have in the repo. I asuume something is wrong either with the dataset or hyperparams.

Could you please share more details about how you trained your model on 250k steps? Which dataset did you use?

Thank you very much, really appreciate it.

step 300000_299999_ec7396805633eb541d94

Apr 06 '24 16:04 aabzaliev

Hey @aabzaliev - interesting results. Agreed the results aren't great (but kind of interesting too). I think it could be a few things.

The noise distribution - the defaults beta_a and beta_b aren't the best - I just changed them - https://github.com/apapiu/transformer_latent_diffusion/commit/2352f0051c40340bacd67920a56c7af91bfab93d - 1 and 2.5 lead to a more skewed distribution that has the model see less noisy data that in my training led to better results.
The data - this is a big one - the full GRIT data might contain a lot of low quality images and/or prompts. Most of the data I used was either synthetic or filtered by CLIP aesthetic score. Try the mj_latents.npy and mj_text_emb.npy from here https://huggingface.co/apapiu/small_ldt/tree/main - this is higher quality synthetic data - I think about 600k examples if I remember correctly.
Model architecture - can you point to what commit you are using or what the architecture looks like? I made some small changes since training the model that could affect training dynamics. Also I used a lr warmup for the first 1k iterations or so that is not in the current training code.

Feel free to try either and let me know what results you get. You can apply both 1 and 2 on top of an already trained model that's no issue.

Apr 06 '24 22:04 apapiu

awesome, thank you so much for such a detailed reponse! I am gonna try to train with the clean data now, and better noise scheduler. I run the experiments last week, cannot check which commit exactly it was, but I am just going to pull the latest from main. I will let you know how it goes!

Apr 07 '24 20:04 aabzaliev

transformer_latent_diffusion transformer_latent_diffusion copied to clipboard

Low image quality even after 300k steps?

transformer_latent_diffusion
transformer_latent_diffusion copied to clipboard