stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Hypernetwork training hardcoded to 512x512?

Open emoose opened this issue 3 years ago • 2 comments

Noticed my preview images for hypernet training were all 512x512, even though I selected 768x768 - started looking into it and found that hypernet code is actually hardcoded to 512x512, without reading the resolution slider value: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/695377a8b9f7de28b880d96487a9ddf7230cff14/modules/hypernetworks/hypernetwork.py#L228

Haven't seen this mentioned/documented anywhere yet, does it maybe mean we should only be using 512x512 images to train hypernets with? Not sure if this means higher resolution images would actually be worse too, ~~since image data past 512x512 might be getting ignored entirely?~~ I've seen a lot of hypernets posted where people mentioned using higher resolutions with it... (E: looks like PersonalizedBase would resize/downsample the image to 512x512, unsure if that's any better or worse than splitting the original image to 512x512 though)

Also does anyone know the reason it's been set to 512x512? AFAIK models like NAI were originally trained at 768x768, could using the res the model was trained with maybe give better results?

emoose avatar Oct 17 '22 18:10 emoose

In my understanding, 512x512 is coming from the specification you set in the txt2img page, and the size you set in the train page should matching your trainning dataset. They are not one thing.

txtyb avatar Oct 18 '22 15:10 txtyb

Seems like the size you set in train page only gets used by the embedding trainer, you can see the code for training embeddings makes use of width=training_width, height=training_height:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/c1093b8051606f0ac90506b7114c4b55d0447c70/modules/textual_inversion/textual_inversion.py#L235

training_width / training_height seem to be the values from the train page, but the code for training hypernetwork doesn't seem to use them at all, and just hardcodes width=512, height=512:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/c1093b8051606f0ac90506b7114c4b55d0447c70/modules/hypernetworks/hypernetwork.py#L228

For me embeds always give a memory error if I go above 512x512 too, but with hypernets I can set it to 2048x2048 fine and doesn't say anything, doesn't really seem used...

I might be missing something tho, would be appreciated if a dev can clarify this somewhere, seen lots of people recommending to use 768x768 images to train hypernets but seems to me that'd just result in images being downscaled before training...

emoose avatar Oct 18 '22 23:10 emoose