tensorrtx
tensorrtx copied to clipboard
3D网络耗时很久
自己定义了一个3D的网络(slowfast),但是发现耗时很长,在pytorch端需要60ms+,但是使用C++ API定义之后需要1s左右,FP16模式下350ms左右。打印了一下耗时,好像是conv3d引起的。请问一下大佬知道是实名原因吗?
Env
- GPU, Xavier NX.
- OS, Ubuntu18.04.
- Cuda version 10.2
- TensorRT version 8.2.1
Did you try to run a loop? The GPU might needs warmup.
我先跑了20次inference才测的速度,我想应该是Xavier NX不太支持3D卷积的缘故。
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.