DALLE2-pytorch
DALLE2-pytorch copied to clipboard
The upsample model generates more regular noise points
Hello everyone, I would like to ask if you have encountered when training upsample (64-->256), the generated images always show the following question:
1 In the same batch, some images are very good, and some images are noisy
2 By increasing the number of sampling steps or changing the sampling strategy does not help
3 When training upsample, there is noise such as Gaussian blur as a condition
4 The training data is more than 100 million, and the training is sufficient
5 This result is accidental
6 64 resolution images are noise free
yes it's a known bug
@1073521013 what happens if you use the conditioned noise approach as was used in Imagen (I have it built in this repository as well)
@1073521013 are you using upsampling unet with the memory efficient setting turned on?
@1073521013 yea, so there's a couple possibilities
(1) the memory efficient unet is bugged or does not mix well with the gaussian blur augmentation (2) gaussian blur has this deficiency but there are unpublished tricks to resolve it (3) there is a bug in my code
it is all likely, but i think the quickest resolution would be to switch to Imagen's way of doing things. it is highly sus that they would remove the blur in favor of the conditioned denoising, even with Jonathan Ho advising (he pioneered the blur technique in the cascading ddpm paper). besides, people are training quite successfully using the technique and it work well together with the memory efficient unet. we can default this repository to the technique used there in the worst case there is no solution to the blur augmentation
@1073521013 yea, so there's a couple possibilities
(1) the memory efficient unet is bugged or does not mix well with the gaussian blur augmentation (2) gaussian blur has this deficiency but there are unpublished tricks to resolve it (3) there is a bug in my code
it is all likely, but i think the quickest resolution would be to switch to Imagen's way of doing things. it is highly sus that they would remove the blur in favor of the conditioned denoising, even with Jonathan Ho advising (he pioneered the blur technique in the cascading ddpm paper). besides, people are training quite successfully using the technique and it work well together with the memory efficient unet. we can default this repository to the technique used there in the worst case there is no solution to the blur augmentation
Thank you for your reply. I used the conditioned noise approach as was used in Imagen and used upsampling unet with the memory efficient setting turned on. In addition, I found that this is rarely the case for some simple sentences, but more complex creative combination sentences. Moreover, fine-tuning upsample in vertical data can significantly improve the inference of vertical data, which can also explain the above problems. Therefore, guessing the training data is also an important direction to improve this problem.
So you have both gaussian blur as well as noise conditioning?
Have you tried training with either the non memory efficient unet or noise conditioning alone?
So you have both gaussian blur as well as noise conditioning?
yes
Have you tried training with either the non memory efficient unet or noise conditioning alone?
Not yet
Have you tried training with either the non memory efficient unet or noise conditioning alone?
Not yet
i would try those two
the memory-efficient unet design was something i brought over from Imagen, but regret doing so
In the sample code, the input for the text is a random tensor, how should I get this tensor from a piece of text for a given dataset?
text = torch.randint(0, 49408, (4, 256)).cuda()