DALLE2-pytorch icon indicating copy to clipboard operation
DALLE2-pytorch copied to clipboard

Replication of the upscalers

Open rom1504 opened this issue 2 years ago • 5 comments

Hey, so we got decent versions of the prior and the basic decoder now.

I think the current code is already able to train upscalers but we need more doc for it.

Let's have a upscaler.md explaining

  • What is it
  • How to prepare the dataset
  • what hyper parameters
  • command to run the training
  • expected GPU hours cost

And then train it!

We can also discuss what's the right dataset, but I figure the laion5B subset we call "laion high resolution" could do the trick (it's 170M images in 1024x1024 or bigger)

I understand only the image (and clip image EMB) is needed and no text ?

rom1504 avatar Jun 19 '22 19:06 rom1504

Here's some relevant sections of the paper for reference while in this thread


image image image image

nousr avatar Jun 19 '22 20:06 nousr

they are also using the BSR degradation used by Rombach et al https://github.com/CompVis/latent-diffusion/tree/e66308c7f2e64cb581c6d27ab6fbeb846828253b/ldm/modules/image_degradation https://github.com/cszn/BSRGAN/blob/main/utils/utils_blindsr.py that I don't have in the repository yet

tempted to just go with Imagen's noising procedure (on top of the blur) and call it a day (it would be a lot simpler)

lucidrains avatar Jun 20 '22 15:06 lucidrains

ok, 0.11.0 should allow for the different noise schedules across different unets, as in the paper

after adding the BSR image degradation (or some alternative), i think i'm comfortable giving the repository a 1.0

lucidrains avatar Jun 20 '22 16:06 lucidrains

I understand only the image (and clip image EMB) is needed and no text ?

@rom1504 yup, no text conditioning needed, i think it should all be in the image embedding!

lucidrains avatar Jun 20 '22 16:06 lucidrains

Hi all, I am aiming to train the decoder and upsampler. Because the decoder and upsampler have too many parameters, so I decide to train them seperately. I saw in the readme which says the upsampler and the decoder net can be trained seperately. I viewed the code, in my understanding, although I can train them seperately, I need to load the parameters of both unet 0 and unet 1 and change the unet number into 1 to train only unet 1. I don't know if I am right. If so, I couldn't train unet0 and unet 1 in two seperate machines. I am wondering how I could train the decoder net and upsamplers seperately? Best,

YUHANG-Ma avatar Jun 26 '22 05:06 YUHANG-Ma