Soft-NMS
Soft-NMS copied to clipboard
gpu softnms is slower because you use numpy in the middle of a torch ops
torch execute gpu ops in async, so when you call numpy, torch will sync up all ops and transfer to cpu, then transfer back when you call torch ops again, which is extremely slow. https://github.com/DocF/Soft-NMS/blob/95dab79eac5c786f61fef2f6d5cd633eec7ecfd6/softnms_pytorch.py#L51
@zylo117 Hi, I implemented the pure pytorch version of the softnms function and I currently write it in the google colab file: https://colab.research.google.com/drive/1gzhXX-LyMdZ41qHv0rKzHpxKLYliEiPn?usp=sharing
What made me confused is that GPU processes the speed() function for even longer time, which is shown below or in the colab notebook. Do you know what is going on?
PyTorch 1.5.1+cu101 CPU
Pure PyTorch, average run time: 24.799434 ms
With NumPy, average run time: 24.725247 ms
PyTorch 1.5.1+cu101 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', major=6, minor=0, total_memory=16280MB, multi_processor_count=56)
Pure PyTorch, average run time: 67.407458 ms
With NumPy, average run time: 80.926901 ms
In that case, I think it's unevitable that gpu version is slower than cpu's because the current algorithm runs in sequence, meaning the next loop requires results of the previous loop. But anyway, I find that softnms is not efficient at all, because it applies confidence thresholding after nms which increases nms processing time by a lot. It takes less than 1ms to perform confidence thresolding + vanilla nms but it now takes 24 ms for softnms. I'm afraid it might not be worthy.
@zylo117 Thank you for your explanation. I think I will try to improve the algorithm in the future.
Thanks for the code!. Using pytorch ops may not be efficient for the GPU version for soft-nms. It would probably need a custom CUDA implementation where each loop can be partitioned into grids (for N classes) and threads (each loop) and then the loop would run for 50-100 iterations. With the right implementation, Soft-NMS should run within 1ms on a GPU. I'll try to push this version in a few weeks.