InternImage
InternImage copied to clipboard
Latency too high with DCNv3_pytorch op on cpu
Hi guys:
when use DCNv3_pytorch for inference, the letency is too high for cpu only device I test with both cuda and cpu infer for cuda device , per image is 29 ms, with DCNV3 version the cost is 20ms for cpu device, per image is 550ms (only test with pytorch)
do you have any idea to optimize the cpu infer cost?
thanks
当然是编写一个cpu算子, wait me
sorry,im loser