k-diffusion
k-diffusion copied to clipboard
Scaling from 128x128, to 256x256, 512x512 and 1024x1024?
hey,
loved your paper and thanks a bunch for providing the code!
i have a quick question, how do you scale and train the network (HDiT) for increased resolutions? i saw you mentioned here: https://github.com/crowsonkb/k-diffusion/issues/14#issuecomment-1199475244 that you first need to build the entire network, and then skip layers but i'm not sure if this also applies to this new architecture?
many thanks!
it looks like it's not meant for progressive scaling? i guess the best option would be to train a lower resolution and then copy the relevant weights to a higher-res network
another thing i was curious about was the inputs:
def forward(self, x, sigma, aug_cond=None, class_cond=None, mapping_cond=None):
x, sigma, and class_cond are clear, but do you have any more details on aug_cond and mapping_cond?
@tin-sely I believe aug_cond
is for non-leaky augmentations. When an input image is augmented during training, a description of how that image was augmented is also given to the generator (as aug_cond
- augmentation conditioning), so that the generator eventually learns how to generate either augmented or non-augmented images depending the value of the aug_cond
input.
I believe mapping_cond
is an older name for aug_cond
which is used in the non-transformer model configs (the ones that use KarrasAugmentWrapper
- which takes the aug_cond
tensor and gives it to the model as mapping_cond
)
thanks a bunch @madebyollin! ✨
My understanding is that you use aug_cond
when you wish to provide the model with information about the augmentations using Fourier Features:
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L657
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L658
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L718
On the other hand, if you use mapping_cond
, the condition will be fed directly into a linear layer, as shown here:
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L660
https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L720
These embeddings are then both fed into the MappingNetwork: https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L721
But getting more clarity on this would definitely help!