YOLOP icon indicating copy to clipboard operation
YOLOP copied to clipboard

Recieved error related to performing NMS while training on Colab

Open sparshgarg23 opened this issue 3 years ago • 2 comments

I am training BDD100K driveable segmentation area identification task on colab

However after one epoch of training is completed during validation I end up recieving the following error

 0% 0/209 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "tools/train.py", line 395, in <module>
    main()
  File "tools/train.py", line 333, in main
    logger, device, rank
  File "/content/YOLOP/lib/core/function.py", line 250, in validate
    output = non_max_suppression(inf_out, conf_thres= config.TEST.NMS_CONF_THRESHOLD, iou_thres=config.TEST.NMS_IOU_THRESHOLD, labels=lb)
  File "/content/YOLOP/lib/core/general.py", line 169, in non_max_suppression
    i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
  File "/usr/local/lib/python3.7/dist-packages/torchvision/ops/boxes.py", line 42, in nms
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
RuntimeError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. 'torchvision::nms' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /root/project/torchvision/csrc/vision.cpp:59 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradXLA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
Tracer: fallthrough registered at /pytorch/torch/csrc/jit/frontend/tracer.cpp:967 [backend fallback]
Autocast: fallthrough registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Pytorch version and GPU version are shown below.Any idea why this is happening?

Pytorch version
1.7.0
CUDA Version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
torch version 1.7
torchvision version as shown in readme

Any idea why this error is happening and what can be done to resolve this in colab?

sparshgarg23 avatar Jul 26 '22 11:07 sparshgarg23

I have this issue ,too . Did you find solution for this ?

farhadi76m avatar Aug 23 '22 11:08 farhadi76m

Resolved this as follows on colab clone yolop directory cd yolop and then follow the below steps

!pip install torch==1.9.0+cu102 torchvision==0.10.0+cu102 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
!pip install -r requirements.txt

sparshgarg23 avatar Aug 23 '22 12:08 sparshgarg23