imagen-pytorch
imagen-pytorch copied to clipboard
About the input image size.
Hi, I am a little confused about the input size. In imagen paper, 3 unets are separately trained. But in this repo, unet1 and unet2 are trained together, the input are resized to 256, not 64. Is that mean, if I want to train 3 unets together, the input should be 1024? I am also not sure about the training process of 64X64--256X256 in imagen, for example, if we use laion400M as dataset, for training base model, all images should be resized to 64. Then for training 256X256 unet, all images should be resized to 256X256 as input , and downsampled to 64X64 to be a condition, in fact, 64X64 unet are not used in this process. Am I right? And if want to train 1024X1024 unet, how can we get 1024X1024 images? The resolution in laion is usually small, we need to resize it to 1024 directly, or we need to use another high-resolution dataset? Thanks.
@zhaobingbingbing you can add as many unets as you like in the cascade