ControlNet What if the controlnet/sd input image is of different sizes?

I got some image dataset A that is extra small (with mean width ~64pxs) and B that is 512x512. What if I feed the two concurrently to the ControlNet/SD?

Will the network only works well when input size is 512x512? Because regularization?

Apr 24 '23 15:04 doem97

try using different images having dimensions of (512+164), (512+264), (512+364) ..... (512-164), (512-264), (512-364)...

May 23 '23 08:05 HassanBinHaroon

try using different images having dimensions of (512+1_64), (512+2_64), (512+3_64) ..... (512-1_64), (512-2_64), (512-3_64)...

Uhh, sorry I don't quite get what the numbers mean..

May 25 '23 09:05 doem97

try using different images having dimensions of (512+1_64), (512+2_64), (512+3_64) ..... (512-1_64), (512-2_64), (512-3_64)...

Uhh, sorry I don't quite get what the numbers mean..

Its fine!

Actually your source and target image pair can be of dimensions (512x512) or (576x576) .....

I hope it clarifies. Let me know if still some clarification required.

Happy Learning!

May 25 '23 10:05 HassanBinHaroon

official answer from lllyasviel: https://github.com/lllyasviel/ControlNet/issues/224#issuecomment-1454997557

in your case you may need to try crop the training samples into rectangles like 5121024 or 512768 samples with humans occupying larger part in the image. note that SD can be trained with any resolution as long as the number can be mod by 64. Your batchsize and parameters look ok to me but consider rent better cloud machines.

Sep 17 '23 10:09 geroldmeisinger

official answer from lllyasviel: #224 (comment)

in your case you may need to try crop the training samples into rectangles like 512_1024 or 512_768 samples with humans occupying larger part in the image. note that SD can be trained with any resolution as long as the number can be mod by 64. Your batchsize and parameters look ok to me but consider rent better cloud machines.

But how should I train for images that are not 1:1 aspect ratio such as 1:2 or 1:4? How do I force the scaling to 1:1, which will result in distortion.

Mar 14 '24 03:03 Ws-zqw

How do I force the scaling to 1:1, which will result in distortion.

you can crop it. however there is no automatic way to tell where best to crop (or add borders) but you can default to center crop. I wrote about this here https://civitai.com/articles/2078/play-in-control-controlnet-training-setup-guide#heading-35441

Mar 15 '24 12:03 geroldmeisinger

So the model only accept that the input images are all the same size? My data set is guaranteed to be a multiple of 64 in width and height (obtained from the canny preprocessing codes in this repository), but an error is reported： RuntimeError: stack expects each tensor to be equal size, but got [512, 896, 3] at entry 0 and [512, 1152, 3] at entry 1

Mar 30 '24 07:03 remember00000

ControlNet ControlNet copied to clipboard

What if the controlnet/sd input image is of different sizes?

ControlNet
ControlNet copied to clipboard