latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

Details about training super resolution model

Open GioFic95 opened this issue 3 years ago • 11 comments

Hi @rromb, @ablattmann, @pesser, and thank you for making your great work publicly available.

Could you please supply the code for the class ldm.data.openimages.SuperresOpenImagesAdvancedTrain/Validation to train your model for super-resolution, as required in bsr_sr/config.yaml (see this line)? Otherwise, some more information about how to train the SR model with datasets not included in your repository would be very helpful.

Thank you very much!

GioFic95 avatar Jan 26 '22 17:01 GioFic95

Hey @GioFic95, did you happen to find if they posted a pre-trained model of their own? Can't find it

roimulia2 avatar Feb 07 '22 12:02 roimulia2

Hi @roimulia2, yes, the link of the pre-trained LDM for super-resolution is this one: https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip. You can find it in this table in the readme, looking for the task "Super-resolution": https://github.com/CompVis/latent-diffusion#pretrained-ldms.

GioFic95 avatar Feb 07 '22 14:02 GioFic95

Hi @roimulia2, yes, the link of the pre-trained LDM for super-resolution is this one: https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip.

You can find it in this table in the readme, looking for the task "Super-resolution": https://github.com/CompVis/latent-diffusion#pretrained-ldms.

Sorry! I meant for the inpainting pre-trained models. Is it available as well?

roimulia2 avatar Feb 07 '22 14:02 roimulia2

@GioFic95 replied above

roimulia2 avatar Feb 07 '22 14:02 roimulia2

@roimulia2 in the "Inpainting" section of the readme they provide a command and the link for the pretrained models for inpainting too.

GioFic95 avatar Feb 07 '22 14:02 GioFic95

@GioFic95 Does this make sense that the weight size is 3.1GB?

roimulia2 avatar Feb 07 '22 16:02 roimulia2

@GioFic95 refer to this line. second_stage_model of SR ddpm has no encode function, therefore cond_stage_key image is still in Image Space not Latent Space. Hence this line elif self.conditioning_key == 'concat': xc = torch.cat([x] + c_concat, dim=1) will throw Sizes of tensors must match except in dimension 2

Any chance bsr_sr/config.yaml is wrong?

kaihe avatar Mar 09 '22 01:03 kaihe

@GioFic95 refer to this line. second_stage_model of SR ddpm has no encode function, therefore cond_stage_key image is still in Image Space not Latent Space. Hence this line elif self.conditioning_key == 'concat': xc = torch.cat([x] + c_concat, dim=1) will throw Sizes of tensors must match except in dimension 2

Any chance bsr_sr/config.yaml is wrong?

I finally figured it out the config is right, according to section 4.4 of the paper, and simply concatenate the low-resolution conditioning y and the inputs to the UNet, i.e. τθ is the identity. low resolution image show be exact the same size with latent space, for example a 64x64x3 kl encoder is only able to upgrade 64x64 image; and a 32x32x4 kl encoder is only able to upgrade 32x32 image.

This is very different from SR3, which will upscale whatever low resolution image to high resolution and cancat them in image space.

I did try to upscale low resolution image, encode with second stage model and cancat them in latent space like SR3. It's always result with random noise output like this: image

kaihe avatar Mar 09 '22 07:03 kaihe

@GioFic95 Hi~ Have you finally figured out where is ldm.data.openimages.SuperresOpenImagesAdvancedTrain/Validation and how to train on other datasets? I read the code pipeline and found it a little bit complicated to train on my own dataset.

IceClear avatar Jun 19 '22 15:06 IceClear

@IceClear Hi, I'm in your same situation, unluckily

GioFic95 avatar Jun 19 '22 16:06 GioFic95

@GioFic95 refer to this line. second_stage_model of SR ddpm has no encode function, therefore cond_stage_key image is still in Image Space not Latent Space. Hence this line elif self.conditioning_key == 'concat': xc = torch.cat([x] + c_concat, dim=1) will throw Sizes of tensors must match except in dimension 2 Any chance bsr_sr/config.yaml is wrong?

I finally figured it out the config is right, according to section 4.4 of the paper, and simply concatenate the low-resolution conditioning y and the inputs to the UNet, i.e. τθ is the identity. low resolution image show be exact the same size with latent space, for example a 64x64x3 kl encoder is only able to upgrade 64x64 image; and a 32x32x4 kl encoder is only able to upgrade 32x32 image.

This is very different from SR3, which will upscale whatever low resolution image to high resolution and cancat them in image space.

I did try to upscale low resolution image, encode with second stage model and cancat them in latent space like SR3. It's always result with random noise output like this: image

In my view, this may not be true since I have successfully generated the 120x120 image using the pre-trained model whose size is 64x64 in the default config. I think the basic idea is that the latent code is generated based on the low-resolution input. Thus, just change the image size in the config to the desired size (must be the multiple of 8) and we can obtain SR images accordingly. But the pre-trained model can only be applied for 4x because it uses f=4, VQ. I am not sure if I am right but the generated image seems reasonable.

bird

IceClear avatar Jun 20 '22 17:06 IceClear

I need a SuperresOpenImagesAdvancedTrain too

jujaryu avatar Dec 15 '22 05:12 jujaryu

@GioFic95 @kaihe @IceClear hi, can you share your inference script of the LDM-BSR model? I encountered a bit problem to reproduce the bsr results shown in the paper.

YunjinChen avatar May 28 '23 14:05 YunjinChen