sherpa In sherpa/triton, joiner. onnx reasoning is very slow

In sherpa/triton, joiner. onnx reasoning is very slow

Open arbs-gpu opened this issue 1 year ago • 4 comments

I use this codehttps://github.com/k2-fsa/sherpa/tree/master/triton/zipformer/model_repo_offline to start the Triton service and send requests to count the time spent on the encoder, decoder, and joiner modules. I found that the joiner module accounts for 95% of the time, while the encoder and decoder add up to less than 5%. Therefore, I think this is very abnormal. Is there an error when the model exports the onnx format, or is there an error in the codehttps://github.com/k2-fsa/sherpa/blob/master/triton/zipformer/model_repo_offline/scorer/1/model.py

Aug 01 '23 13:08 arbs-gpu

I have the same problem, maybe your test audio is too long.

Aug 10 '23 09:08 ziyu123

@yuekaizhang Could you have a look at this issue?

Aug 10 '23 09:08 csukuangfj

Are there decoder settings that would affect this? (I assume it depends on the search method, beams, etc)

Aug 10 '23 09:08 danpovey

riton service and send requests to count the time spent on the encoder, decoder, and joiner modules. I found

How do you count the time for triton modules? The normal distribution would like this https://github.com/yuekaizhang/Triton-ASR-Client/blob/main/log/stats_summary.txt.

Also, what's your test audio length? (It should be okay if it is shorter than 30 seconds.)

Aug 10 '23 09:08 yuekaizhang

sherpa sherpa copied to clipboard

In sherpa/triton, joiner. onnx reasoning is very slow

sherpa
sherpa copied to clipboard