sherpa
sherpa copied to clipboard
In sherpa/triton, joiner. onnx reasoning is very slow
I use this codehttps://github.com/k2-fsa/sherpa/tree/master/triton/zipformer/model_repo_offline to start the Triton service and send requests to count the time spent on the encoder, decoder, and joiner modules. I found that the joiner module accounts for 95% of the time, while the encoder and decoder add up to less than 5%. Therefore, I think this is very abnormal. Is there an error when the model exports the onnx format, or is there an error in the codehttps://github.com/k2-fsa/sherpa/blob/master/triton/zipformer/model_repo_offline/scorer/1/model.py
I have the same problem, maybe your test audio is too long.
@yuekaizhang Could you have a look at this issue?
Are there decoder settings that would affect this? (I assume it depends on the search method, beams, etc)
riton service and send requests to count the time spent on the encoder, decoder, and joiner modules. I found
How do you count the time for triton modules? The normal distribution would like this https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/log/stats_summary.txt.
Also, what's your test audio length? (It should be okay if it is shorter than 30 seconds.)