PyTorch-Encoding icon indicating copy to clipboard operation
PyTorch-Encoding copied to clipboard

How can i train encnet in one GPU?

Open neverstoplearn opened this issue 4 years ago • 11 comments

neverstoplearn avatar Jan 27 '20 13:01 neverstoplearn

Similar issues: https://github.com/zhanghang1989/PyTorch-Encoding/issues/205 https://github.com/zhanghang1989/PyTorch-Encoding/issues/210#issuecomment-578752998

The code was written for multiple GPUs. If you would like run on single GPU, the following part need to change: 1). change SyncBatchNorm to regular nn.BatchNorm 2). The network output shape may need some work, which is a tuple of list of tensors (ngpus x noutputs).

zhanghang1989 avatar Jan 28 '20 02:01 zhanghang1989

thanks for your answer. i will try it.

neverstoplearn avatar Jan 28 '20 06:01 neverstoplearn

Hi there, did you solve the problem with the output shape? Cuz I've encounterer the same error. Thank U!

SeanCho1996 avatar Apr 01 '20 06:04 SeanCho1996

checked with the author @zhanghang1989 , the next version should be released in 2~3 weeks, which will support single gpu training.

StacyYang avatar Apr 08 '20 19:04 StacyYang

Using https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/experiments/segmentation/train_dist.py should solve this problem. I am close it. Feel free to reopen if error exists.

zhanghang1989 avatar Apr 21 '20 05:04 zhanghang1989

Using /experiments/segmentation/train_dist.py@master should solve this problem. I am close it. Feel free to reopen if error exists.

Can you give an example of train_dist.py?

wanghao9610 avatar Jun 01 '20 04:06 wanghao9610

The cmd is actually the same:

python train_dist.py --dataset ADE20K --model fcn  --aux --backbone resnest50

zhanghang1989 avatar Jun 01 '20 04:06 zhanghang1989

The cmd is actually the same:

python train_dist.py --dataset ADE20K --model fcn  --aux --backbone resnest50

I have tried, but here came an error, how to fix this? _pickle.PicklingError: Can't pickle <function main_worker at 0x7f104ff1b700>: attribute lookup main_worker on main failed

wanghao9610 avatar Jun 01 '20 06:06 wanghao9610

That's wired. Are you using PyTorch 1.4.0?

zhanghang1989 avatar Jun 01 '20 06:06 zhanghang1989

That's wired. Are you using PyTorch 1.4.0? Yes, pytorch version is 1.4.0, encoding version is 1.2.1b20200527. Is it not compatible with pytorch1.4.0?

wanghao9610 avatar Jun 01 '20 06:06 wanghao9610

Yes, it should be compatible with pytorch 1.4.0. I am using that version.

zhanghang1989 avatar Jun 01 '20 06:06 zhanghang1989