taming-transformers
taming-transformers copied to clipboard
How to sample high-resolution images?
Hi,
this paper claims to be able to produce high-resolution images, yet there is no configuration in the code that can learn images bigger that 256x256. Increasing the resolution to 512x512 in the VQGAN leads to Out of Memory errors.
I guess it should then be able to generate images of higher resolutions from smaller generators, but do you accompany the reader with an implementation in your code base, that succeeds this?
Thank you for this wonderful work,
The Out of Memory errors are most likely not an issue with the codebase, but rather the system you are running it on. I have not run the code myself, but I would assume if you're system had more memory you would not get these errors.
You have not understood my question.
Training a GAN with 512x512 leads to OOM which of course depends on my system.
However the official paper claims that it can sample bigger images e.g. 512x512 from smaller models i.e. 256x256 by sliding a window and calculating the transformer output in the region:
This is the whole selling point of the paper: taming transformers for high-resolution image synthesis:
In this example there was no model that was trained on 1280x460 pixels, but a smaller model has been used with the method I describe above to sample a high-resolution image, which makes it more lightweight as it can train in smaller systems (which is the memory consuming part).
I haven't found any code for this particular type of sampling though in the codebase.
https://colab.research.google.com/github/CompVis/taming-transformers/blob/master/scripts/taming-transformers.ipynb#scrollTo=5rVRrUOwbEH0