sherpa-onnx
sherpa-onnx copied to clipboard
Benchmarking on Android
I've seen that for Icefall, the 2 ways to export models are using either ONNX (this package) or NCNN.
Has there been any benchmarking done for the 2 methods? I'm wondering which one would be faster.
I did find that there's this page https://github.com/k2-fsa/sherpa-ncnn/issues/44 which includes some NCNN run times.
We have not benchmarked sherpa-onnx on Android. However, we have compared the RTF of sherpa-ncnn and sherpa-onnx on macOS and Raspberry pi 4 Model B with streaming zipformer.
The following table compares the RTF for greedy search with 1 thread
| sherpa-ncnn | sherpa-onnx | |
|---|---|---|
| macOS | 0.159 | 0.125 |
| Raspberry Pi 4 Model B | 0.871 | 0.697 |
If speed is the only thing you care about, then I suggest that you choose sherpa-onnx.
It is a pain to compile onnxruntime from source if you don't use pre-compiled onnxruntime libs. We have not managed to compile onnxruntime for 32-bit arm.
I don't know how easy it is to add a custom operator to onnxruntime.
The source code of ncnn is very well readable and it is easy to extend it. It also provides a tool PNNX to convert models from PyTorch. If there is an op that cannot be converted, it is straightforward to change PNNX and ncnn to support it.
One thing I want to mention is that the file size of libncnn.so for Android is less than 1.2 MB. If you customize it, you can get an even smaller lib. I don't know if there is any open-source inference framework that can produce such a small lib.
Also, ncnn supports non-NVidia GPUs, e.g., GPUs on your mobile phones and ARM GPUs on your embedded boards. ncnn also supports RISC-V.