ControlNet
ControlNet copied to clipboard
What if the controlnet/sd input image is of different sizes?
I got some image dataset A that is extra small (with mean width ~64pxs) and B that is 512x512. What if I feed the two concurrently to the ControlNet/SD?
Will the network only works well when input size is 512x512? Because regularization?
try using different images having dimensions of (512+164), (512+264), (512+364) ..... (512-164), (512-264), (512-364)...
try using different images having dimensions of (512+1_64), (512+2_64), (512+3_64) ..... (512-1_64), (512-2_64), (512-3_64)...
Uhh, sorry I don't quite get what the numbers mean..
try using different images having dimensions of (512+1_64), (512+2_64), (512+3_64) ..... (512-1_64), (512-2_64), (512-3_64)...
Uhh, sorry I don't quite get what the numbers mean..
Its fine!
Actually your source and target image pair can be of dimensions (512x512) or (576x576) .....
I hope it clarifies. Let me know if still some clarification required.
Happy Learning!
official answer from lllyasviel: https://github.com/lllyasviel/ControlNet/issues/224#issuecomment-1454997557
in your case you may need to try crop the training samples into rectangles like 5121024 or 512768 samples with humans occupying larger part in the image. note that SD can be trained with any resolution as long as the number can be mod by 64. Your batchsize and parameters look ok to me but consider rent better cloud machines.
official answer from lllyasviel: #224 (comment)
in your case you may need to try crop the training samples into rectangles like 512_1024 or 512_768 samples with humans occupying larger part in the image. note that SD can be trained with any resolution as long as the number can be mod by 64. Your batchsize and parameters look ok to me but consider rent better cloud machines.
But how should I train for images that are not 1:1 aspect ratio such as 1:2 or 1:4? How do I force the scaling to 1:1, which will result in distortion.
How do I force the scaling to 1:1, which will result in distortion.
you can crop it. however there is no automatic way to tell where best to crop (or add borders) but you can default to center crop. I wrote about this here https://civitai.com/articles/2078/play-in-control-controlnet-training-setup-guide#heading-35441
So the model only accept that the input images are all the same size? My data set is guaranteed to be a multiple of 64 in width and height (obtained from the canny preprocessing codes in this repository), but an error is reported: RuntimeError: stack expects each tensor to be equal size, but got [512, 896, 3] at entry 0 and [512, 1152, 3] at entry 1