Detectron.pytorch
Detectron.pytorch copied to clipboard
Index out of range on multi-GPU (8 gpus ) after first epoch
Expected results
Successful Training
Actual results
Detailed steps to reproduce
After Running the main and on completion of first epoch, I get an index out of range error with drop_last = False
on
mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])
I tried to trace the error reason and came to know that after first epoch last 3 device ids i.e, 5,6,7 which is very weird behaviour. E.g.:
CUDA_VISIBLE_DEVICES=4,5,6,7 python tools/train_net_step.py --dataset dota_patches --cfg configs/baselines/e2e_mask_rcnn_X-101-64x4d-FPN_2x.yaml --bs 8 --nw 8
System information
- Operating system: ubuntu16.04
- CUDA version: 9.0
- cuDNN version: 7.0
- GPU models (for all devices if they are not all the same):?
- python version: 3.6
- pytorch version: 0.4.0