Latency too high with DCNv3_pytorch op on cpu

Open xbkaishui opened this issue 2 years ago • 2 comments

Hi guys:

when use DCNv3_pytorch for inference, the letency is too high for cpu only device I test with both cuda and cpu infer for cuda device , per image is 29 ms, with DCNV3 version the cost is 20ms for cpu device, per image is 550ms （only test with pytorch）

do you have any idea to optimize the cpu infer cost？

thanks

Dec 01 '23 09:12 xbkaishui

当然是编写一个cpu算子, wait me

Nov 06 '25 12:11 hxaxd

sorry，im loser

Nov 08 '25 09:11 hxaxd