Detectron.pytorch
Detectron.pytorch copied to clipboard

Published 20 hours ago •

Reame
Issues

Index out of range on multi-GPU (8 gpus ) after first epoch

Open akshitac8 opened this issue 5 years ago • 0 comments

Expected results

Successful Training

Actual results

Detailed steps to reproduce

After Running the main and on completion of first epoch, I get an index out of range error with drop_last = False on

mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])

I tried to trace the error reason and came to know that after first epoch last 3 device ids i.e, 5,6,7 which is very weird behaviour. E.g.:

CUDA_VISIBLE_DEVICES=4,5,6,7 python tools/train_net_step.py --dataset dota_patches --cfg configs/baselines/e2e_mask_rcnn_X-101-64x4d-FPN_2x.yaml --bs 8 --nw 8

System information

Operating system: ubuntu16.04
CUDA version: 9.0
cuDNN version: 7.0
GPU models (for all devices if they are not all the same):?
python version: 3.6
pytorch version: 0.4.0

Feb 26 '19 20:02 akshitac8