sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

Benchmarking on Android

Open gcervantes8 opened this issue 2 years ago • 1 comments
trafficstars

I've seen that for Icefall, the 2 ways to export models are using either ONNX (this package) or NCNN.

Has there been any benchmarking done for the 2 methods? I'm wondering which one would be faster.

I did find that there's this page https://github.com/k2-fsa/sherpa-ncnn/issues/44 which includes some NCNN run times.

gcervantes8 avatar Apr 20 '23 22:04 gcervantes8

We have not benchmarked sherpa-onnx on Android. However, we have compared the RTF of sherpa-ncnn and sherpa-onnx on macOS and Raspberry pi 4 Model B with streaming zipformer.

The following table compares the RTF for greedy search with 1 thread

sherpa-ncnn sherpa-onnx
macOS 0.159 0.125
Raspberry Pi 4 Model B 0.871 0.697

If speed is the only thing you care about, then I suggest that you choose sherpa-onnx.


It is a pain to compile onnxruntime from source if you don't use pre-compiled onnxruntime libs. We have not managed to compile onnxruntime for 32-bit arm.

I don't know how easy it is to add a custom operator to onnxruntime.


The source code of ncnn is very well readable and it is easy to extend it. It also provides a tool PNNX to convert models from PyTorch. If there is an op that cannot be converted, it is straightforward to change PNNX and ncnn to support it.

One thing I want to mention is that the file size of libncnn.so for Android is less than 1.2 MB. If you customize it, you can get an even smaller lib. I don't know if there is any open-source inference framework that can produce such a small lib.

Also, ncnn supports non-NVidia GPUs, e.g., GPUs on your mobile phones and ARM GPUs on your embedded boards. ncnn also supports RISC-V.

csukuangfj avatar Apr 21 '23 03:04 csukuangfj