Robin Rombach comments

Results 15 comments of


Robin Rombach

Execution error

Hi, thanks for checking out our code! What you describe is most likely triggered by another error that occurs during the initialization of the script. Please check the full stack...

Execution error

Which version of pytorch-lightning are you using? This code still uses `pl=0.9` and is not compatible with lightning-versions >= 1.0. Additionally, you can try to set the `save_top_key` = 0,...

How to decide the training epochs or early stop condition?

Thanks! The VQGAN benefits greatly from training it as long as possible (provided the data set is large enough and overfitting is a secondary concern), and tuning in the discriminator...

When will release cond-LDM inference scripts?

Hi, thanks for your interest in our work. We just released scripts for text-to-image and class-conditional synthesis (and corresponding checkpoints) with #27.

How to run experiments on DeepFashion?

See #28 :)

predicted_indices not found in losses

Hey @zhihongp, thanks for catching this! I have just added the VQGAN loss in f13bf9bf463d95b5a16aeadd2b02abde31f769f8. It is the same as in the taming-transformers repo, but provides some additional information about...

Question about training stability

Hi, sorry for the late reply. I will take a guess and suggest to run the training with the following command: `python main.py --base configs/latent-diffusion/lsun_churches-ldm-kl-8.yaml -t --gpus --scale_lr False` This...

Is a separate model trained per benchmark?

For the different conditioning tasks (semantic synthesis, depth-to-image etc) we train different transformer models. The VQGAN on ImageNet should be fairly general and we re-use it across some tasks, but...

inference time

Hi. Yes, one way is to store already calculated attention weights when creating a sequence. See for example https://huggingface.co/transformers/quickstart.html#using-the-past. Note that this is not currently implemented for our models as...

Extraction layer emb

Hi, which embeddings are you referring to exactly? Do you mean "internal" transformer representations?