latent-diffusion Reproduction problem while training inpainting model

Thanks for the good work. I am trying to reproduce the diffusion model upon image inpainting task. The configuration file I uses is modified from models/ldm/inpainting_big/config.yaml. But the loss curve apppears to be quite weird. It converges too fast just after the warmup ends.

(Note that the warmup steps is 1000. The loss function value has come to a pretty low value at 1000 steps.)

Also, the inpainting result are poor in quality. This is one of my own test data. (trained on FFHQ dataset)

Does anyone encounter the same problem? I feel like this might be caused by a learning rate issue. Please answer and fix this problem. Thank you very much!

Sep 23 '22 10:09 AlonzoLeeeooo

could you give me an example of your dataloader? I am using the same config file giving the masked image as 3 channels and the image as 3 channels as well but I am getting this error

RuntimeError: Given groups=1, weight of size [256, 7, 3, 3], expected input[4, 6, 64, 64] to have 7 channels, but got 6 channels instead

Oct 06 '22 10:10 ahmarey

You need to modify correpsonding part in ddpm.py. I solve the problem by concatenating mask, masked_image and image. Then the input has 7 channels as the configuration file gives. However it still seems hard to reproduce the official result. I have no idea how long the author has trained the model for. I have trained to entire 3 days and the inpainted result is still blur.

Oct 06 '22 13:10 AlonzoLeeeooo

Do u mean in the dataloader, masked_image will have masked_image, mask and image? or if u mean ddpm.py could u specify where please.

Oct 06 '22 13:10 ahmarey

I fixed my problem, and after training I got the same output as you, just noise in the masked parts.

Oct 13 '22 12:10 ahmarey

Hi guys, same problem here.

Oct 18 '22 11:10 JeanJulesBigeard

Do u mean in the dataloader, masked_image will have masked_image, mask and image? or if u mean ddpm.py could u specify where please.

Hi, could you tell me where to change? Thx a lot!

Jan 11 '23 16:01 ImmortalSdm

I suggest you spend some time understanding the whole codebase. It would be a lot easier if you understand the process of how Stable Diffusion works and how they implement this process. Althout it might take a while.

Generally speaking, you could modify ldm/models/diffusion/ddpm.py according to the script scripts/inpaint.py, which would be used while inference. In inpaint.py, line 79 we could see that the mask in each batch is downsampled to the same size as we pass the masked image through VQ model. It is implemented using nn.interpolate() and then concatenated with the encoded masked image. We should also keep the same mode of adding mask while we are training. So, the number of input channels of the whole U-Net should be 7 channels (image, 3 channels + masked image, 3 channels + mask, 1 channel = 7 channels), and mask and masked image should be concatenated with the input image in the same way while we are inference. In this manner, modifying correpsonding lines in ddpm.py should be able to work.

Jan 13 '23 13:01 AlonzoLeeeooo

Thx for your reply, i've solved the problem already.

Jan 13 '23 14:01 ImmortalSdm

Hi @AlonzoLeeeooo,

Any update on your progress? Were you able to achieve good inpainting results on your custom dataset? If so, it would be great if you could share your training pipeline/ configurations.

Feb 08 '23 05:02 zaryabmakram

Hi @zaryabmakram ,

I didn't successfully re-train the diffusion model. The results are always blur even if the model is trained for 3 days. Empirically this is caused by insufficient training. Afterwards, I notice that the reported GPU requirement in the supplementary materials of Stable Diffusion is 8 V100 GPUs. Due to limited computational resources, I have to give up the idea of reproducing it.

Feb 08 '23 06:02 AlonzoLeeeooo

How about finetuning the provided inpainting_big checkpoint instead of training from scratch? Have you experimented with that? Do you think that might output good results on a custom dataset?

Also, are you aware of which dataset inpainting_big checkpoint has been trained on?

Feb 08 '23 06:02 zaryabmakram

How about finetuning the provided inpainting_big checkpoint instead of training from scratch? Have you experimented with that? Do you think that might output good results on a custom dataset?

Also, are you aware of which dataset inpainting_big checkpoint has been trained on?

I haven't tried finetuning yet. But the idea should be able to work, theoretically. The reported training set is Places2 Standard. It is worth mentioning that using the provided inpainting_big is able to produce plausible results on most natural image cases. Maybe you could try it out.

Feb 08 '23 06:02 AlonzoLeeeooo

I see, thanks! Well, I'll look into how I can try finetuning the inpainting checkpoint.

Can you kindly point me to the reference reporting that Places2 Standard dataset has been used for the inpainting model training? I'm unable to find that.

Feb 08 '23 06:02 zaryabmakram

I see, thanks! Well, I'll look into how I can try finetuning the inpainting checkpoint.

Can you kindly point me to the reference reporting that Places2 Standard dataset has been used for the inpainting model training? I'm unable to find that.

It is at Table 15 in their supplementary materials. As for the supplementary materials, you could refer to https://openaccess.thecvf.com/content/CVPR2022/supplemental/Rombach_High-Resolution_Image_Synthesis_CVPR_2022_supplemental.pdf.

Feb 08 '23 06:02 AlonzoLeeeooo

You need to modify correpsonding part in ddpm.py. I solve the problem by concatenating mask, masked_image and image. Then the input has 7 channels as the configuration file gives. However it still seems hard to reproduce the official result. I have no idea how long the author has trained the model for. I have trained to entire 3 days and the inpainted result is still blur.

Could you please tell me what encoder you use as cond_stage_config for training the inpainting model?

Feb 08 '23 18:02 aleksmirosh

Hi @zaryabmakram ,

I didn't successfully re-train the diffusion model. The results are always blur even if the model is trained for 3 days. Empirically this is caused by insufficient training. Afterwards, I notice that the reported GPU requirement in the supplementary materials of Stable Diffusion is 8 V100 GPUs. Due to limited computational resources, I have to give up the idea of reproducing it.

你好！我对你的训练细节很感兴趣。请问你训练了3天，使用了多大的batchsize，一共占用了多少显存，以及训练了多少个epoch呢？我还没有进行复现，但我对latent-diffusion文中所说的”能够减少显存开销“很感兴趣，他真的能够通过latent space来达到减少显存的效果吗？期待你的回复！

Feb 20 '23 07:02 DongyangHuLi

Hi @zaryabmakram , I didn't successfully re-train the diffusion model. The results are always blur even if the model is trained for 3 days. Empirically this is caused by insufficient training. Afterwards, I notice that the reported GPU requirement in the supplementary materials of Stable Diffusion is 8 V100 GPUs. Due to limited computational resources, I have to give up the idea of reproducing it.

你好！我对你的训练细节很感兴趣。请问你训练了3天，使用了多大的batchsize，一共占用了多少显存，以及训练了多少个epoch呢？我还没有进行复现，但我对latent-diffusion文中所说的”能够减少显存开销“很感兴趣，他真的能够通过latent space来达到减少显存的效果吗？期待你的回复！

你好@DongyangHuLi，

我设置的batch size是48，但是为了节省显存，我将model channels降低到了128，训练了应该有600k次迭代次数，使用的是两张3090来做训练。对于latent-diffusion所说的“减少显存开销”的说法，应该是相比于DDPM说的，实际上需要的显卡需求依然不小，对于inpainting的setting来说，依然需要8张Tesla v100来训练。所以可以说我自身的算力条件依然不足以复现原论文的效果。以上供参考，希望能帮到你。

Feb 20 '23 07:02 AlonzoLeeeooo

Hi @zaryabmakram , I didn't successfully re-train the diffusion model. The results are always blur even if the model is trained for 3 days. Empirically this is caused by insufficient training. Afterwards, I notice that the reported GPU requirement in the supplementary materials of Stable Diffusion is 8 V100 GPUs. Due to limited computational resources, I have to give up the idea of reproducing it.

你好！我对你的训练细节很感兴趣。请问你训练了3天，使用了多大的batchsize，一共占用了多少显存，以及训练了多少个epoch呢？我还没有进行复现，但我对latent-diffusion文中所说的”能够减少显存开销“很感兴趣，他真的能够通过latent space来达到减少显存的效果吗？期待你的回复！

你好@DongyangHuLi，

我设置的batch size是48，但是为了节省显存，我将model channels降低到了128，训练了应该有600k次迭代次数，使用的是两张3090来做训练。对于latent-diffusion所说的“减少显存开销”的说法，应该是相比于DDPM说的，实际上需要的显卡需求依然不小，对于inpainting的setting来说，依然需要8张Tesla v100来训练。所以可以说我自身的算力条件依然不足以复现原论文的效果。以上供参考，希望能帮到你。

谢谢！那这么说，扩散模型没有足够的硬件资源是很难work得了了😔

Feb 20 '23 07:02 DongyangHuLi

Hi @zaryabmakram , I didn't successfully re-train the diffusion model. The results are always blur even if the model is trained for 3 days. Empirically this is caused by insufficient training. Afterwards, I notice that the reported GPU requirement in the supplementary materials of Stable Diffusion is 8 V100 GPUs. Due to limited computational resources, I have to give up the idea of reproducing it.

你好！我对你的训练细节很感兴趣。请问你训练了3天，使用了多大的batchsize，一共占用了多少显存，以及训练了多少个epoch呢？我还没有进行复现，但我对latent-diffusion文中所说的”能够减少显存开销“很感兴趣，他真的能够通过latent space来达到减少显存的效果吗？期待你的回复！

你好@DongyangHuLi，我设置的batch size是48，但是为了节省显存，我将model channels降低到了128，训练了应该有600k次迭代次数，使用的是两张3090来做训练。对于latent-diffusion所说的“减少显存开销”的说法，应该是相比于DDPM说的，实际上需要的显卡需求依然不小，对于inpainting的setting来说，依然需要8张Tesla v100来训练。所以可以说我自身的算力条件依然不足以复现原论文的效果。以上供参考，希望能帮到你。

谢谢！那这么说，扩散模型没有足够的硬件资源是很难work得了了😔

是的，基本就是烧钱才能做的research🙍‍♂️

Feb 20 '23 07:02 AlonzoLeeeooo

Thanks for the good work. I am trying to reproduce the diffusion model upon image inpainting task. The configuration file I uses is modified from models/ldm/inpainting_big/config.yaml. But the loss curve apppears to be quite weird. It converges too fast just after the warmup ends.

(Note that the warmup steps is 1000. The loss function value has come to a pretty low value at 1000 steps.)

Also, the inpainting result are poor in quality. This is one of my own test data. (trained on FFHQ dataset)

Does anyone encounter the same problem? I feel like this might be caused by a learning rate issue. Please answer and fix this problem. Thank you very much!

Could you share the inpainting training code? Thank you!

Mar 19 '23 10:03 xiangli93

Hi @zaryabmakram , I didn't successfully re-train the diffusion model. The results are always blur even if the model is trained for 3 days. Empirically this is caused by insufficient training. Afterwards, I notice that the reported GPU requirement in the supplementary materials of Stable Diffusion is 8 V100 GPUs. Due to limited computational resources, I have to give up the idea of reproducing it.

你好！我对你的训练细节很感兴趣。请问你训练了3天，使用了多大的batchsize，一共占用了多少显存，以及训练了多少个epoch呢？我还没有进行复现，但我对latent-diffusion文中所说的”能够减少显存开销“很感兴趣，他真的能够通过latent space来达到减少显存的效果吗？期待你的回复！

你好@DongyangHuLi，我设置的batch size是48，但是为了节省显存，我将model channels降低到了128，训练了应该有600k次迭代次数，使用的是两张3090来做训练。对于latent-diffusion所说的“减少显存开销”的说法，应该是相比于DDPM说的，实际上需要的显卡需求依然不小，对于inpainting的setting来说，依然需要8张Tesla v100来训练。所以可以说我自身的算力条件依然不足以复现原论文的效果。以上供参考，希望能帮到你。

谢谢！那这么说，扩散模型没有足够的硬件资源是很难work得了了😔

是的，基本就是烧钱才能做的research🙍‍♂️

兄弟也是ustc的？加个微信交流一下？我微信号Kiss_The_Rain8，麻烦加一下哈，向你请教一下

Mar 28 '23 02:03 ustczhouyu

Hi @AlonzoLeeeooo Do I need to finetune the autoencoder separately (stage 1) on my custom dataset and then finetune the inpainting_big model by modifying the input in ddpm.py as in inpaint.py (stage 2) on my custom dataset? Or only Stage 2 would work. Please help.

Apr 04 '23 19:04 rayush7

Hi @AlonzoLeeeooo Do I need to finetune the autoencoder separately (stage 1) on my custom dataset and then finetune the inpainting_big model by modifying the input in ddpm.py as in inpaint.py (stage 2) on my custom dataset? Or only Stage 2 would work. Please help.

Hi @rayush7 , As far as I am concerned, you don't need to tune the model parameters of the VQ model (stage 1). Since the official one is trained on open images dataset, it should be sufficient to encode most of the images. Only finetuning stage 2 should be able to work.

Apr 07 '23 07:04 AlonzoLeeeooo

Thank you @AlonzoLeeeooo I will give it a try.

Apr 09 '23 15:04 rayush7

how to prepare the data for inpaint?

Apr 10 '23 11:04 diaodeyi

The papers mentions that the data preparation step is same as in LaMa. https://github.com/advimman/lama

Apr 10 '23 18:04 rayush7

I suggest you spend some time understanding the whole codebase. It would be a lot easier if you understand the process of how Stable Diffusion works and how they implement this process. Althout it might take a while.

Generally speaking, you could modify ldm/models/diffusion/ddpm.py according to the script scripts/inpaint.py, which would be used while inference. In inpaint.py, line 79 we could see that the mask in each batch is downsampled to the same size as we pass the masked image through VQ model. It is implemented using nn.interpolate() and then concatenated with the encoded masked image. We should also keep the same mode of adding mask while we are training. So, the number of input channels of the whole U-Net should be 7 channels (image, 3 channels + masked image, 3 channels + mask, 1 channel = 7 channels), and mask and masked image should be concatenated with the input image in the same way while we are inference. In this manner, modifying correpsonding lines in ddpm.py should be able to work.

Could I have a look at your modified code for this part, thank you very much if I could ! ! !

Apr 26 '23 01:04 mumingerlai

Hi @mumingerlai ,

I really would like to help but since I was working on another project related to the same codebase, a huge amount of modifications upon the codebase have been made and it would be quite difficult for me to retrieve the corresponding parts of inpainting. Sorry for not able to do you the favor.

But if there is any other problem about the modification, please feel free to discuss in this issue and I would try my best to recall and answer.

Regards, Chang

Apr 26 '23 06:04 AlonzoLeeeooo

Hi @mumingerlai ,

I really would like to help but since I was working on another project related to the same codebase, a huge amount of modifications upon the codebase have been made and it would be quite difficult for me to retrieve the corresponding parts of inpainting. Sorry for not able to do you the favor.

But if there is any other problem about the modification, please feel free to discuss in this issue and I would try my best to recall and answer.

Regards, Chang

I feel very happy for your reply. I have modified inpainting. py and concated images, masked images, and masks. It seems that they can also run normally! Anyway, thank you very much!

Apr 26 '23 06:04 mumingerlai

Hi @mumingerlai , I really would like to help but since I was working on another project related to the same codebase, a huge amount of modifications upon the codebase have been made and it would be quite difficult for me to retrieve the corresponding parts of inpainting. Sorry for not able to do you the favor. But if there is any other problem about the modification, please feel free to discuss in this issue and I would try my best to recall and answer. Regards, Chang

I feel very happy for your reply. I have modified inpainting. py and concated images, masked images, and masks. It seems that they can also run normally! Anyway, thank you very much!

Hello, could I have a look at your modification method and data config for inpainting train? These tasks are difficult for me. Thank you very much!

Apr 27 '23 02:04 shensongli

latent-diffusion latent-diffusion copied to clipboard

Reproduction problem while training inpainting model

latent-diffusion
latent-diffusion copied to clipboard