DALLE2-pytorch icon indicating copy to clipboard operation
DALLE2-pytorch copied to clipboard

The upsample model generates more regular noise points

Open 1073521013 opened this issue 2 years ago • 11 comments

Hello everyone, I would like to ask if you have encountered when training upsample (64-->256), the generated images always show the following question: 1 In the same batch, some images are very good, and some images are noisy 2 By increasing the number of sampling steps or changing the sampling strategy does not help 3 When training upsample, there is noise such as Gaussian blur as a condition 4 The training data is more than 100 million, and the training is sufficient 5 This result is accidental 6 64 resolution images are noise free 2 (10)

1073521013 avatar Sep 22 '22 08:09 1073521013

yes it's a known bug

rom1504 avatar Sep 24 '22 12:09 rom1504

@1073521013 what happens if you use the conditioned noise approach as was used in Imagen (I have it built in this repository as well)

lucidrains avatar Sep 24 '22 15:09 lucidrains

@1073521013 are you using upsampling unet with the memory efficient setting turned on?

lucidrains avatar Sep 24 '22 16:09 lucidrains

@1073521013 yea, so there's a couple possibilities

(1) the memory efficient unet is bugged or does not mix well with the gaussian blur augmentation (2) gaussian blur has this deficiency but there are unpublished tricks to resolve it (3) there is a bug in my code

it is all likely, but i think the quickest resolution would be to switch to Imagen's way of doing things. it is highly sus that they would remove the blur in favor of the conditioned denoising, even with Jonathan Ho advising (he pioneered the blur technique in the cascading ddpm paper). besides, people are training quite successfully using the technique and it work well together with the memory efficient unet. we can default this repository to the technique used there in the worst case there is no solution to the blur augmentation

lucidrains avatar Sep 24 '22 16:09 lucidrains

@1073521013 yea, so there's a couple possibilities

(1) the memory efficient unet is bugged or does not mix well with the gaussian blur augmentation (2) gaussian blur has this deficiency but there are unpublished tricks to resolve it (3) there is a bug in my code

it is all likely, but i think the quickest resolution would be to switch to Imagen's way of doing things. it is highly sus that they would remove the blur in favor of the conditioned denoising, even with Jonathan Ho advising (he pioneered the blur technique in the cascading ddpm paper). besides, people are training quite successfully using the technique and it work well together with the memory efficient unet. we can default this repository to the technique used there in the worst case there is no solution to the blur augmentation

Thank you for your reply. I used the conditioned noise approach as was used in Imagen and used upsampling unet with the memory efficient setting turned on. In addition, I found that this is rarely the case for some simple sentences, but more complex creative combination sentences. Moreover, fine-tuning upsample in vertical data can significantly improve the inference of vertical data, which can also explain the above problems. Therefore, guessing the training data is also an important direction to improve this problem.

1073521013 avatar Sep 25 '22 02:09 1073521013

So you have both gaussian blur as well as noise conditioning?

lucidrains avatar Sep 25 '22 02:09 lucidrains

Have you tried training with either the non memory efficient unet or noise conditioning alone?

lucidrains avatar Sep 25 '22 02:09 lucidrains

So you have both gaussian blur as well as noise conditioning?

yes

1073521013 avatar Sep 25 '22 03:09 1073521013

Have you tried training with either the non memory efficient unet or noise conditioning alone?

Not yet

1073521013 avatar Sep 25 '22 03:09 1073521013

Have you tried training with either the non memory efficient unet or noise conditioning alone?

Not yet

i would try those two

the memory-efficient unet design was something i brought over from Imagen, but regret doing so

lucidrains avatar Sep 25 '22 17:09 lucidrains

In the sample code, the input for the text is a random tensor, how should I get this tensor from a piece of text for a given dataset? text = torch.randint(0, 49408, (4, 256)).cuda()

QinSY123 avatar Nov 02 '22 15:11 QinSY123