k-diffusion Scaling from 128x128, to 256x256, 512x512 and 1024x1024?

hey,

loved your paper and thanks a bunch for providing the code!

i have a quick question, how do you scale and train the network (HDiT) for increased resolutions? i saw you mentioned here: https://github.com/crowsonkb/k-diffusion/issues/14#issuecomment-1199475244 that you first need to build the entire network, and then skip layers but i'm not sure if this also applies to this new architecture?

many thanks!

Feb 05 '24 06:02 tin-sely

it looks like it's not meant for progressive scaling? i guess the best option would be to train a lower resolution and then copy the relevant weights to a higher-res network

another thing i was curious about was the inputs:

def forward(self, x, sigma, aug_cond=None, class_cond=None, mapping_cond=None):

x, sigma, and class_cond are clear, but do you have any more details on aug_cond and mapping_cond?

Feb 06 '24 09:02 tin-sely

@tin-sely I believe aug_cond is for non-leaky augmentations. When an input image is augmented during training, a description of how that image was augmented is also given to the generator (as aug_cond - augmentation conditioning), so that the generator eventually learns how to generate either augmented or non-augmented images depending the value of the aug_cond input.

I believe mapping_cond is an older name for aug_cond which is used in the non-transformer model configs (the ones that use KarrasAugmentWrapper - which takes the aug_cond tensor and gives it to the model as mapping_cond)

Feb 06 '24 15:02 madebyollin

thanks a bunch @madebyollin! ✨

Feb 07 '24 07:02 tin-sely

My understanding is that you use aug_cond when you wish to provide the model with information about the augmentations using Fourier Features: https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L657 https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L658 https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L718

On the other hand, if you use mapping_cond, the condition will be fed directly into a linear layer, as shown here: https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L660 https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L720

These embeddings are then both fed into the MappingNetwork: https://github.com/crowsonkb/k-diffusion/blob/6ab5146d4a5ef63901326489f31f1d8e7dd36b48/k_diffusion/models/image_transformer_v2.py#L721

But getting more clarity on this would definitely help!

Feb 09 '24 11:02 mnslarcher

k-diffusion k-diffusion copied to clipboard

Scaling from 128x128, to 256x256, 512x512 and 1024x1024?

k-diffusion
k-diffusion copied to clipboard