EfficientLPS icon indicating copy to clipboard operation
EfficientLPS copied to clipboard

when train the EfficientLPS,throw the error "index out of bounds"

Open heijieke opened this issue 3 years ago • 3 comments

when i train the model in 128 epoch,throw the below error: /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [127,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. Traceback (most recent call last): File "./tools/train.py", line 145, in main() File "./tools/train.py", line 141, in main meta=meta) File "/home/luyipeng/EfficientLPS/mmdet/apis/train.py", line 102, in train_detector meta=meta) File "/home/luyipeng/EfficientLPS/mmdet/apis/train.py", line 182, in _dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/luyipeng/miniconda3/envs/efficientLPS_env/lib/python3.7/site-packages/mmcv/runner/runner.py", line 384, in run epoch_runner(data_loaders[i], **kwargs) File "/home/luyipeng/miniconda3/envs/efficientLPS_env/lib/python3.7/site-packages/mmcv/runner/runner.py", line 283, in train self.model, data_batch, train_mode=True, **kwargs) File "/home/luyipeng/EfficientLPS/mmdet/apis/train.py", line 75, in batch_processor losses = model(**data) File "/home/luyipeng/miniconda3/envs/efficientLPS_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/luyipeng/miniconda3/envs/efficientLPS_env/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/luyipeng/miniconda3/envs/efficientLPS_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/luyipeng/EfficientLPS/mmdet/core/fp16/decorators.py", line 49, in new_func return old_func(*args, **kwargs) File "/home/luyipeng/EfficientLPS/mmdet/models/efficientlps/base.py", line 145, in forward return self.forward_train(img, img_metas, **kwargs) File "/home/luyipeng/EfficientLPS/mmdet/models/efficientlps/efficientLPS.py", line 205, in forward_train semantic_logits = self.semantic_head(x[:4], x_range[:4]) File "/home/luyipeng/miniconda3/envs/efficientLPS_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/luyipeng/EfficientLPS/mmdet/models/mask_heads/efficientlps_semantic_head.py", line 312, in forward feats[idx] = lateral_conv_ss(feats[idx], r_off) File "/home/luyipeng/miniconda3/envs/efficientLPS_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/luyipeng/EfficientLPS/mmdet/models/mask_heads/efficientlps_semantic_head.py", line 224, in forward x_b = shift_x(x_u, x_range, DDPC_max, self.range_out_b) File "/home/luyipeng/EfficientLPS/mmdet/models/mask_heads/efficientlps_semantic_head.py", line 39, in shift_x x_off = torch.from_numpy(np.array([-1, 0, 1, -1, 0, 1, -1, 0, 1])).cuda(x.device) RuntimeError: CUDA error: device-side assert triggered

heijieke avatar Sep 17 '22 11:09 heijieke

I'm confused by the reason of the error and can't handle it ,please help me.

heijieke avatar Sep 17 '22 11:09 heijieke

the error occur in "offset_y = D_max * ((x_range_y - x_range_y.min()) / (x_range_y.max()-x_range_y.min()))" when x_range_y.max() equals x_range_y.min(),but i still don't konw how to handle it.

heijieke avatar Sep 19 '22 12:09 heijieke

I have the same issue and maybe this will help you: I trained 3 times on different computers with different capacity of GPUs. Every time i got the same error at the same iteration. I can try to start it again but with the same results. The reason for that is for me until today a mystery.

  1. Nvidia Quadro K1200: ~Iteration 45.
  2. Nvidia GeForce RTX 2060: ~Iteration 90.
  3. 2x Nvidia GeForce RTX 3090: success with 160 Iterations.

I hope this gives you maybe more information, even though it doesnt explain the error.

MRemhof avatar Oct 07 '22 14:10 MRemhof