ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

RuntimeError when training on multiple GPUs

Open LiuShiyu95 opened this issue 2 years ago • 2 comments

I tried to train on multiple GPUs, but when reading the data, even if I set num_workers=0, I still get the error RuntimeError: unable to open shared memory object and I don't have root access, so I can't increase the openfile data. trainer = pl.Trainer(gpus=2, precision=32, callbacks=[logger]) As soon as I change gpus to 1, training works fine. Anyone have ideas?

LiuShiyu95 avatar Mar 02 '23 02:03 LiuShiyu95

so am i

qingfengmingyue avatar Mar 10 '23 09:03 qingfengmingyue

Try increasing the shared memory size by running this on the terminal ulimit -n 16384

  • check this issue and the last comment https://github.com/lllyasviel/ControlNet/issues/165#issuecomment-1464726313

kernelguardian avatar Mar 11 '23 01:03 kernelguardian