PyTorch-HITNet-Hierarchical-Iterative-Tile-Refinement-Network-for-Real-time-Stereo-Matching icon indicating copy to clipboard operation
PyTorch-HITNet-Hierarchical-Iterative-Tile-Refinement-Network-for-Real-time-Stereo-Matching copied to clipboard

Cuda error: device-side assert triggered

Open wbensvage opened this issue 3 years ago • 4 comments

When it runs training, it always shown this error after some iterations:

/opt/conda/conda-bld/pytorch_1565287025495/work/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [0,0,0], thread: [6,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. Traceback (most recent call last): File "/home/workspace/HITNet/PyTorch-HITNet/main.py", line 297, in train() File "/home/workspace/HITNet/PyTorch-HITNet/main.py", line 125, in train loss, scalar_outputs, image_outputs = train_sample(sample, compute_metrics=do_summary) File "/home/workspace/HITNet/PyTorch-HITNet/main.py", line 225, in train_sample disp_gt, dx_gt, dy_gt, args.maxdisp) File "/home/workspace/HITNet/PyTorch-HITNet/loss/total_loss.py", line 46, in global_loss lambda_init * init_loss(cv, d_gt_pyramid[i], maxdisp)[mask] File "/home/workspace/HITNet/PyTorch-HITNet/loss/initialization_loss.py", line 16, in init_loss cost_nm = torch.gather(pred_init_cost, 1, get_non_match_disp(pred_init_cost, d_gt)) File "/home/workspace/HITNet/PyTorch-HITNet/loss/initialization_loss.py", line 50, in get_non_match_disp INF = torch.Tensor([float("Inf")]).view(1, 1, 1, 1).repeat(B, D, H, W).to(d_gt.device) RuntimeError: CUDA error: device-side assert triggered

Have you ever met this before? Or any ways to solve to this?

wbensvage avatar Mar 03 '21 18:03 wbensvage

Hi. It seems like a out-of-boundary index problem. Are you using my newest uploaded code for initialization loss? It has limited the index range.

MJITG avatar Mar 05 '21 02:03 MJITG

I've reproduced this error on my machine and total_loss.py is updated now.

MJITG avatar Mar 06 '21 07:03 MJITG

I've reproduced this error on my machine and total_loss.py is updated now.

Great update and I'll test also. Here are another 2 more questions:

  1. Do you have any plan to cuda-rized your code of the "tile-feature" or "tile hypothesis" part? I tested that part and it seems take around 0.5s which seems much longer than the paper mentioned.
  2. It seems a little bug about your disparity data but I cannot reproduce that right now. I'll let you know once I meet that again.

Great work again. Appreciate that!

wbensvage avatar Mar 08 '21 19:03 wbensvage

Thank you! Currently there is no plan for programming those parts in cuda

MJITG avatar Mar 09 '21 09:03 MJITG