BS-RoFormer Issues Related to TensorRT Accelerated Inference

Issues Related to TensorRT Accelerated Inference

Open bfloat16 opened this issue 6 months ago • 0 comments

After separating STFT and ISTFT from the BSRoformer class, I was able to successfully export the model to ONNX, and trtexec could convert the ONNX model to a TensorRT engine. However, TensorRT did not accelerate the inference; instead, it was twice as slow compared to the Torch implementation.

Torch takes approximately 0.13 seconds to infer a slice, while TensorRT takes 0.27 seconds to infer the same slice (tested on an RTX 4090). Using NVIDIA Nsight for monitoring, the preliminary analysis suggests that the slowdown is caused by the Tile operation. Is there any way to alleviate this issue in TensorRT without retraining the model?

Nsight Result: c93fd2dff5ed687b4b5af605e678c500 Modified source code: https://github.com/bfloat16/Music-Source-Separation-Training

Aug 09 '24 17:08 bfloat16

BS-RoFormer BS-RoFormer copied to clipboard

Issues Related to TensorRT Accelerated Inference

BS-RoFormer
BS-RoFormer copied to clipboard