mmdetection
mmdetection copied to clipboard
Implementing problems in Rotated YOLOX
I am implementing Rotated YOLOX for MMRotate in https://github.com/open-mmlab/mmrotate/pull/409, SimOTA Assigner has CUDA Error while training.
Compared with mmdet, only get_in_gt_and_in_center_info and bbox_overlaps is different to support rotated detection. After set CUDA_LAUNCH_BLOCKING=1, the error log shows that error may cause by binary_cross_entropy. It's werid because there is no error when training with fp16. Is there any suggestion to debug that?
Error info:
./aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [58,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
Traceback (most recent call last):
File "/miniconda3/lib/python3.9/site-packages/mmdet/core/bbox/assigners/sim_ota_assigner.py", line 67, in assign
assign_result = self._assign(pred_scores, priors, decoded_bboxes,
File "/workspace/mmrotate/mmrotate/core/bbox/assigners/r_sim_ota_assinger.py", line 85, in _assign
F.binary_cross_entropy(
File "/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 3065, in binary_cross_entropy
return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
RuntimeError: CUDA error: device-side assert triggered
I found the output of network become nan, so bce loss in simota got nan input and trigger the error. Maybe lower lr or gradclip will fix that, i'll do some expriment to figure out.
我正在open-mmlab/mmrotate#409 中为 MMRotate 实现旋转 YOLOX,SimOTA 分配器在训练时有 CUDA 错误。
与 mmdet 相比,只有不同之处在于支持旋转检测。设置后,错误日志显示错误可能由binary_cross_entropy导致。这很奇怪,因为使用 fp16 训练时没有错误。有什么建议可以调试吗?
get_in_gt_and_in_center_info``bbox_overlaps``CUDA_LAUNCH_BLOCKING=1错误信息:
./aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [58,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed. Traceback (most recent call last): File "/miniconda3/lib/python3.9/site-packages/mmdet/core/bbox/assigners/sim_ota_assigner.py", line 67, in assign assign_result = self._assign(pred_scores, priors, decoded_bboxes, File "/workspace/mmrotate/mmrotate/core/bbox/assigners/r_sim_ota_assinger.py", line 85, in _assign F.binary_cross_entropy( File "/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 3065, in binary_cross_entropy return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum) RuntimeError: CUDA error: device-side assert triggered
The following error occurred when I used your yoloX. I only changed img_ Scale and num_ classes:
Traceback (most recent call last):
File "E:\lrk\trail\code\mmrotate-ryolox\tools\train.py", line 196, in