ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

What if the controlnet/sd input image is of different sizes?

Open doem97 opened this issue 1 year ago • 7 comments

I got some image dataset A that is extra small (with mean width ~64pxs) and B that is 512x512. What if I feed the two concurrently to the ControlNet/SD?

Will the network only works well when input size is 512x512? Because regularization?

doem97 avatar Apr 24 '23 15:04 doem97

try using different images having dimensions of (512+164), (512+264), (512+364) ..... (512-164), (512-264), (512-364)...

HassanBinHaroon avatar May 23 '23 08:05 HassanBinHaroon

try using different images having dimensions of (512+1_64), (512+2_64), (512+3_64) ..... (512-1_64), (512-2_64), (512-3_64)...

Uhh, sorry I don't quite get what the numbers mean..

doem97 avatar May 25 '23 09:05 doem97

try using different images having dimensions of (512+1_64), (512+2_64), (512+3_64) ..... (512-1_64), (512-2_64), (512-3_64)...

Uhh, sorry I don't quite get what the numbers mean..

Its fine!

Actually your source and target image pair can be of dimensions (512x512) or (576x576) .....

I hope it clarifies. Let me know if still some clarification required.

Happy Learning!

HassanBinHaroon avatar May 25 '23 10:05 HassanBinHaroon

official answer from lllyasviel: https://github.com/lllyasviel/ControlNet/issues/224#issuecomment-1454997557

in your case you may need to try crop the training samples into rectangles like 5121024 or 512768 samples with humans occupying larger part in the image. note that SD can be trained with any resolution as long as the number can be mod by 64. Your batchsize and parameters look ok to me but consider rent better cloud machines.

geroldmeisinger avatar Sep 17 '23 10:09 geroldmeisinger

official answer from lllyasviel: #224 (comment)

in your case you may need to try crop the training samples into rectangles like 512_1024 or 512_768 samples with humans occupying larger part in the image. note that SD can be trained with any resolution as long as the number can be mod by 64. Your batchsize and parameters look ok to me but consider rent better cloud machines.

But how should I train for images that are not 1:1 aspect ratio such as 1:2 or 1:4? How do I force the scaling to 1:1, which will result in distortion.

Ws-zqw avatar Mar 14 '24 03:03 Ws-zqw

How do I force the scaling to 1:1, which will result in distortion.

you can crop it. however there is no automatic way to tell where best to crop (or add borders) but you can default to center crop. I wrote about this here https://civitai.com/articles/2078/play-in-control-controlnet-training-setup-guide#heading-35441

geroldmeisinger avatar Mar 15 '24 12:03 geroldmeisinger

So the model only accept that the input images are all the same size? My data set is guaranteed to be a multiple of 64 in width and height (obtained from the canny preprocessing codes in this repository), but an error is reported: RuntimeError: stack expects each tensor to be equal size, but got [512, 896, 3] at entry 0 and [512, 1152, 3] at entry 1

remember00000 avatar Mar 30 '24 07:03 remember00000