DCNv2 icon indicating copy to clipboard operation
DCNv2 copied to clipboard

why you not support CPU

Open azuryl opened this issue 6 years ago • 11 comments

it is difficult to rrealize by code?

azuryl avatar Oct 25 '19 10:10 azuryl

I agree - CPU version would be very useful for debugging purposes.

I'm trying to use the CornerNet code, which relies on your code (https://github.com/xingyizhou/CenterNet/tree/master/src/lib/models/networks/DCNv2). It fails to run the inference demo with most of the models, because of memory issues (i have 4GB GPU).

yossibiton avatar Nov 12 '19 09:11 yossibiton

@azuryl @yossibiton @CharlesShang do we have cpu version of dcnv2 ? if not when can we expect the cpu version

abhigoku10 avatar Jan 04 '20 11:01 abhigoku10

@abhigoku10, @yossibiton, @azuryl, I have modified DCNv2 from this repository to add the CPU functionality. I have submitted a pull request to Charles Shang, but so far there is no response from him. Have a look and try my implementation: https://github.com/palver7/DCNv2 .

@CharlesShang Please have a look and comment/review on my pull request.

palver7 avatar Mar 17 '20 08:03 palver7

@abhigoku10, @yossibiton, @azuryl, I have modified DCNv2 from this repository to add the CPU functionality. I have submitted a pull request to Charles Shang, but so far there is no response from him. Have a look and try my implementation: https://github.com/palver7/DCNv2 .

@CharlesShang Please have a look and comment/review on my pull request.

your link https://github.com/palver7/DCNv2 are 404 , where can I get CPU DCNv2 . Thanks very much .

macqueen09 avatar May 20 '20 09:05 macqueen09

@palver7 thanks for sharing it , but getting 404 error can you share you the link

abhigoku10 avatar May 20 '20 09:05 abhigoku10

Hi, @macqueen09 @abhigoku10, Charles Shang has merged my repo with his, now DCNv2 in this repo can operate using cpu or gpu. Because of this, I do not need to maintain my repo and I deleted it. That is why you get the 404 error You can re download the DCNv2 and run python3 testcpu.py to see if it runs on your cpu.

palver7 avatar May 21 '20 13:05 palver7

@palver7 can you share the location of ur repo , i tried to find it but could not see in your profile thanks for doing it

abhigoku10 avatar May 21 '20 14:05 abhigoku10

@abhigoku10 I have deleted DCNv2 from my repo. Check again this link https://github.com/CharlesShang/DCNv2 readme. it now has a line that says run python testcpu.py to check if it runs on CPU. This was from my merged repo.

Also, If you check the files inside the src/cpu directory you will see that they now contain actual codes instead of the previous "not implemented on cpu" error message placeholders. You can now use Charles' DCNv2 on CPU as well as GPU.

palver7 avatar May 22 '20 08:05 palver7

@palver7 @CharlesShang thanks a lot for work you guys have done !!!

abhigoku10 avatar May 23 '20 14:05 abhigoku10

@abhigoku10 I have deleted DCNv2 from my repo. Check again this link https://github.com/CharlesShang/DCNv2 readme. it now has a line that says run python testcpu.py to check if it runs on CPU. This was from my merged repo.

Also, If you check the files inside the src/cpu directory you will see that they now contain actual codes instead of the previous "not implemented on cpu" error message placeholders. You can now use Charles' DCNv2 on CPU as well as GPU.

Great work! And I have used your dcnv2-cpu version into mmdetection for prediction and get correct result. But the cpu dcnv2 is really slow. In my situation one dcn operation will cost 200~600ms as GPU only use 3ms. For networks with multiple dcn layers, the speed is a real concern. When I want to speed up it, I read the code and "yeah, not much to do". Do you have some advice for better implementation? Or any other implementation we can refer to ?

Update: I added openmp into im2col, it's a good tool to speed up loop operations.

tabsun avatar Jun 19 '20 09:06 tabsun

@tabsun Hi, I am happy to hear the CPU implementation works for you. Thanks for sharing about openmp too. I was going to suggest that you try making a CPU version of the TH Cuda blas Sgemmbatched routine, since that was what Charles used (in the dcn_v2_cuda.cu file) to improve the CUDA version. I changed that to just ordinary TH float blas gemm because I cannot find the CPU version for the cuda batched gemm routine.

palver7 avatar Jun 23 '20 04:06 palver7