pytorch-detect-rfcn icon indicating copy to clipboard operation
pytorch-detect-rfcn copied to clipboard

RuntimeError: arguments are located on different GPUs

Open KevinQian97 opened this issue 5 years ago • 0 comments

Hi, I met a strange problem when using mGPUs for traning, which says:

RuntimeError: arguments are located on different GPUs at /opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:269

The configs are as followed: Namespace(batch_size=4, checkepoch=1, checkpoint=0, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda=True, dataset='imagenet_vid+imagenet_det', disp_interval=100, large_scale=False, lr=0.001, lr_decay_gamma=0.1, lr_decay_step=5, mGPUs=True, max_epochs=20, net='res101', num_workers=0, optimizer='sgd', resume=False, save_dir='output/models', session=1, start_epoch=1, use_tfboard=False)

Totally, I have four 2080Ti gpus for trainning. When I set the bs to 8 and use all four gpus, the problem disappears.

Really thanks for your help.

Best,

KevinQian97 avatar Jun 16 '19 09:06 KevinQian97