mmcv icon indicating copy to clipboard operation
mmcv copied to clipboard

An unreasonable return cause gpu coredump(border_align_backward)

Open Wickyzheng opened this issue 3 years ago • 2 comments

Thanks for reporting the unexpected results and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The unexpected results still exist in the latest version.

Describe the Issue A clear and concise description of what the bug is, including what results are expected and what the real results you got. When I read the code of mmcv/mmcv/ops/csrc/common/cuda/border_align_cuda_kernel.cuh, line:185, I found the code is unreasonable, in the function bilinear_interpolate_gradient, when the input parameters x or y is out of boundary, x_low 、x_high、 y_low and y_high are -1, mmcv/mmcv/ops/csrc/common/cuda/border_align_cuda_kernel.cuh, line:189, offset_grad_input add a negative number, which may cause gpu coredump.

Reproduction

  1. What command, code, or script did you run?
A placeholder for the command.
  1. Did you make any modifications on the code? Did you understand what you have modified?

Environment

  1. Please run python -c "from mmcv.utils import collect_env; print(collect_env())" to collect necessary environment information and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error traceback here.

A placeholder for traceback.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Wickyzheng avatar Jul 21 '22 10:07 Wickyzheng

Thanks for the notification! You are right, we should check the value of x_low ... before atomicAdd like this. We will fix it ASAP. Or would you like to send a PR to fix it?

grimoire avatar Jul 22 '22 07:07 grimoire

Thanks, it will be ok if you fix it ASAP.

Wickyzheng avatar Aug 05 '22 02:08 Wickyzheng