BS-RoFormer
BS-RoFormer copied to clipboard
Issues Related to TensorRT Accelerated Inference
After separating STFT and ISTFT from the BSRoformer
class, I was able to successfully export the model to ONNX, and trtexec
could convert the ONNX model to a TensorRT engine. However, TensorRT did not accelerate the inference; instead, it was twice as slow compared to the Torch implementation.
Torch takes approximately 0.13 seconds to infer a slice, while TensorRT takes 0.27 seconds to infer the same slice (tested on an RTX 4090). Using NVIDIA Nsight for monitoring, the preliminary analysis suggests that the slowdown is caused by the Tile operation. Is there any way to alleviate this issue in TensorRT without retraining the model?
Nsight Result:
Modified source code:
https://github.com/bfloat16/Music-Source-Separation-Training