Palette-Image-to-Image-Diffusion-Models icon indicating copy to clipboard operation
Palette-Image-to-Image-Diffusion-Models copied to clipboard

Do you use attention in the upsampling and downsampling blocks when training?

Open DanBigioi opened this issue 2 years ago • 1 comments

I notice that in the config files for all the experiments, channel_mults is set to [1,2,4,8], while attn_res is at 16. This means that you dont use attention within the upsampling and downsampling block right? Since according to the documentation:

:param attn_res: a collection of downsample rates at which attention will take place. May be a set, list, or tuple. For example, if this contains 4, then at 4x downsampling, attention will be used.

Is this an intentional design choice?

Also, you mention in the read me that "We used the attention mechanism in low-resolution features (16×16) like vanilla DDPM.". Do you mean [32x32] as the images you train on are 256x256, and the feature size is of [32x32] when you reach the middleblock where attention is used.

Thank you for the great repo!

DanBigioi avatar Oct 26 '22 10:10 DanBigioi

There may be some problems with your understanding. the attention setting is for crosspoding image size ,16 means the image size after downsample,i will give you a example 6464->down sample to 3232 3232 ->down to 1616,so this layer use attention 1616->88 88->44 you could see from code in unet

codgodtao avatar Dec 10 '22 07:12 codgodtao