sherpa-onnx
sherpa-onnx copied to clipboard
[Help wanted] Customize onnxruntime
TODO
- [ ] Reduce the number of operator kernels when building onnxruntime to reduce the resulting file size of the generated lib
Please see https://onnxruntime.ai/docs/build/custom.html
Hello, I'm interested to work on this task, and I've followed the ONNX Runtime docs to try and build a much smaller lib for iOS. Would love to get some feedback on this work.
I've started by forking csukuangfj/ios-onnxruntime and making some changes based on this guide; you can find my fork here. For this purpose, I will use csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26 from which the custom lib will be specialized towards.
As such, I began with cloning the repo to download the ONNX models and use the cloned models and this script to create the reduced ops.config
. You can find the bash script here. The resultant ops.config
file looks like this
# Generated from ONNX model/s:
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-64.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-64.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-64.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-64.onnx
ai.onnx;13;Add,Cast,Clip,Concat,Constant,ConstantOfShape,Conv,Div,DynamicQuantizeLinear,Equal,Exp,Expand,Gather,GatherElements,Gemm,GreaterOrEqual,Identity,If,LessOrEqual,Log,MatMul,MatMulInteger,Mul,Neg,Pow,Range,ReduceMean,ReduceSum,Relu,Reshape,Shape,Sigmoid,Slice,Softmax,Squeeze,Sub,Tanh,Tile,Transpose,Unsqueeze,Where
I then specified this ops.config
file when building the iOS libraries. Namely, I added these options:
python3 \
$DIR/tools/ci_build/build.py \
--build_dir $build_dir \
+ --include_ops_by_config ops.config \
+ --enable_reduced_operator_type_support \
--config Release \
--use_xcode \
--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
--cmake_extra_defines onnxruntime_BUILD_SHARED_LIB=OFF \
--cmake_extra_defines CMAKE_INSTALL_PREFIX=$build_dir/install/ \
--ios \
--ios_sysroot iphoneos \
--osx_arch arm64 \
--target install \
--parallel $num_jobs \
--skip_tests \
--build_apple_framework \
--apple_deploy_target 13.0 \
--use_coreml
in these three files: ios-arm64, ios-simulator-arm64, ios-simulator-os64.
And finally ran the script to create onnxruntime.xcframework
:
$ tree -h onnxruntime.xcframework
[ 192] onnxruntime.xcframework
├── [ 224] Headers
│ ├── [1.5K] coreml_provider_factory.h
│ ├── [ 397] cpu_provider_factory.h
│ ├── [170K] onnxruntime_c_api.h
│ ├── [ 86K] onnxruntime_cxx_api.h
│ └── [ 67K] onnxruntime_cxx_inline.h
├── [1.0K] Info.plist
├── [ 96] ios-arm64
│ └── [ 55M] onnxruntime.a
└── [ 96] ios-arm64_x86_64-simulator
└── [113M] onnxruntime.a
3 directories, 8 files
and successfully created the onnxruntime.xcframework
, which you can find here.
Unfortunately, I wasn't able to get this to run on the sample iOS sherpa-onnx app. I think the checkout of onnxruntime
I used is too old and needs an upgrade. Nonetheless, I noticed that the resultant onnxruntime.a
files are just slightly smaller than that found in the complete builds, hence I'm wondering if I'm missing some additional steps to further reduce the library size. Looking forward to getting this task done, cheers!
EDIT: I've tried updating my onnx
and onnxruntime
Python versions to 1.14.0 and 1.15.1 respectively, and I'm still unable to run the sample iOS app. For the fp32 model, I'm getting this error
libc++abi: terminating due to uncaught exception of type Ort::Exception: Could not find an implementation for Max(13) node with name '/conv/conv.3/Max'
and for the int8 model, I'm getting
libc++abi: terminating due to uncaught exception of type Ort::Exception: Failed to load model with error: /path/to/ios-onnxruntime/onnxruntime/onnxruntime/core/graph/model_load_utils.h:56 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::string, int> &, const logging::Logger &, bool, const std::string &, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 19 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain com.ms.internal.nhwc is till opset 18.
For the fp32 model, it's unclear why the op isn't included in my custom ops.config
even though all the models have been specified during its creation.
EDIT 2: Fixed the latter issue with the int8 model by using a more recent commit of onnxruntime
. Both models now have the same former issue.
Nonetheless, I noticed that the resultant onnxruntime.a files are just slightly smaller than that found in the complete builds, hence I'm wondering if I'm missing some additional steps to further reduce the library size
Thanks for the detailed description.
One approach is to convert the model to onnxruntime format and when you invoke https://github.com/microsoft/onnxruntime/blob/main/tools/python/create_reduced_build_config.py you can pass additional arguments to it.
Thanks for the pointers, @csukuangfj!
My latest progress is that I'm finally able to build my own custom iOS lib that successfully ran converted models in .ort
format, although the lib size is still the same. I am going to try several more flags (e.g. minimal_build
) to see how small I can get the library to be. I will keep you posted.
Hi @csukuangfj.
I was able to build a minimal ONNX lib for iOS, and that managed to get the lib to be about half the size of the original full build. However, I am getting an error on during the initialization of the recognizer, and it seems to be happening on the C level, which I'm unable to debug from the Swift side. I am guessing it's probably several unsupported ops, which is a usual issue for mobile/minimal ONNX builds.
If you'd like to give it a shot, I have provided the onnxruntime.xcframework
and its build scripts in my fork here. For the models, I was running ORT-converted models here. Cheers.
I was able to build a minimal ONNX lib for iOS
Great!
Are you able to build a version for macOS (x86_64)? That would make the debugging easier as we can use macOS to debug it.
Sure, I can try that out in a bit.
@csukuangfj I was able to build ONNX Runtime for macOS from source, but the resultant files aren't like those found in the official releases. Do you happen to know the exact build args? This is what I've been running so far.
python3 \
$DIR/tools/ci_build/build.py \
--build_dir $build_dir \
--include_ops_by_config ops.config \
--disable_ml_ops --disable_exceptions --disable_rtti \
--minimal_build extended \
--config Release \
--use_xcode \
--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
--cmake_extra_defines onnxruntime_BUILD_SHARED_LIB=ON \
--cmake_extra_defines CMAKE_INSTALL_PREFIX=$build_dir/install/ \
--osx_arch x86_64 \
--target install \
--parallel $num_jobs \
--skip_tests \
--build_apple_framework \
--apple_deploy_target 13.0 \
--use_coreml \
--path_to_protoc_exe /usr/local/Cellar/protobuf@21/21.12/bin/protoc-3.21.12.0
and this is what I'm getting
$ tree -L 4 -d build-macos
build-macos
└── x86_64
├── Release
│ ├── CMakeFiles
│ │ ├── 3.26.4
│ │ ├── CMakeScratch
│ │ ├── Export
│ │ ├── onnxruntime.dir
│ │ └── pkgRedirects
│ ├── CMakeScripts
│ ├── Debug
│ ├── MinSizeRel
│ ├── RelWithDebInfo
│ ├── Release
│ │ └── onnxruntime.framework
│ ├── Testing
│ │ └── Temporary
│ ├── _deps
│ │ ├── abseil_cpp-build
│ │ ├── abseil_cpp-src
│ │ ├── abseil_cpp-subbuild
│ │ ├── date-build
│ │ ├── date-src
│ │ ├── date-subbuild
│ │ ├── eigen-build
│ │ ├── eigen-src
│ │ ├── eigen-subbuild
│ │ ├── flatbuffers-build
│ │ ├── flatbuffers-src
│ │ ├── flatbuffers-subbuild
│ │ ├── google_nsync-build
│ │ ├── google_nsync-src
│ │ ├── google_nsync-subbuild
│ │ ├── gsl-build
│ │ ├── gsl-src
│ │ ├── gsl-subbuild
│ │ ├── microsoft_wil-build
│ │ ├── microsoft_wil-src
│ │ ├── microsoft_wil-subbuild
│ │ ├── mp11-build
│ │ ├── mp11-src
│ │ ├── mp11-subbuild
│ │ ├── nlohmann_json-build
│ │ ├── nlohmann_json-src
│ │ ├── nlohmann_json-subbuild
│ │ ├── onnx-build
│ │ ├── onnx-src
│ │ ├── onnx-subbuild
│ │ ├── protobuf-build
│ │ ├── protobuf-src
│ │ ├── protobuf-subbuild
│ │ ├── pytorch_cpuinfo-build
│ │ ├── pytorch_cpuinfo-src
│ │ ├── pytorch_cpuinfo-subbuild
│ │ ├── re2-build
│ │ ├── re2-src
│ │ ├── re2-subbuild
│ │ ├── safeint-build
│ │ ├── safeint-src
│ │ └── safeint-subbuild
│ ├── build
│ │ ├── EagerLinkingTBDs
│ │ ├── Release
│ │ ├── XCBuildData
│ │ ├── absl_bad_optional_access.build
│ │ ├── absl_bad_variant_access.build
│ │ ├── absl_base.build
│ │ ├── absl_city.build
│ │ ├── absl_civil_time.build
│ │ ├── absl_cord.build
│ │ ├── absl_cord_internal.build
│ │ ├── absl_cordz_functions.build
│ │ ├── absl_cordz_handle.build
│ │ ├── absl_cordz_info.build
│ │ ├── absl_debugging_internal.build
│ │ ├── absl_demangle_internal.build
│ │ ├── absl_exponential_biased.build
│ │ ├── absl_graphcycles_internal.build
│ │ ├── absl_hash.build
│ │ ├── absl_hashtablez_sampler.build
│ │ ├── absl_int128.build
│ │ ├── absl_log_severity.build
│ │ ├── absl_low_level_hash.build
│ │ ├── absl_malloc_internal.build
│ │ ├── absl_raw_hash_set.build
│ │ ├── absl_raw_logging_internal.build
│ │ ├── absl_spinlock_wait.build
│ │ ├── absl_stacktrace.build
│ │ ├── absl_strings.build
│ │ ├── absl_strings_internal.build
│ │ ├── absl_symbolize.build
│ │ ├── absl_synchronization.build
│ │ ├── absl_throw_delegate.build
│ │ ├── absl_time.build
│ │ ├── absl_time_zone.build
│ │ ├── clog.build
│ │ ├── cpuinfo.build
│ │ ├── flatbuffers.build
│ │ ├── flatc.build
│ │ ├── libprotobuf-lite.build
│ │ ├── nsync_cpp.build
│ │ ├── onnx.build
│ │ ├── onnx_proto.build
│ │ ├── onnxruntime.build
│ │ ├── onnxruntime_common.build
│ │ ├── onnxruntime_coreml_proto.build
│ │ ├── onnxruntime_flatbuffers.build
│ │ ├── onnxruntime_framework.build
│ │ ├── onnxruntime_graph.build
│ │ ├── onnxruntime_mlas.build
│ │ ├── onnxruntime_optimizer.build
│ │ ├── onnxruntime_providers.build
│ │ ├── onnxruntime_providers_coreml.build
│ │ ├── onnxruntime_session.build
│ │ ├── onnxruntime_util.build
│ │ └── re2.build
│ ├── coreml
│ ├── onnx
│ ├── onnxruntime.xcodeproj
│ │ └── project.xcworkspace
│ ├── op_reduction.generated
│ │ ├── onnxruntime
│ │ └── orttraining
│ ├── static_framework
│ │ └── onnxruntime.framework
│ └── static_libraries
└── install
├── bin
│ └── onnxruntime.framework
├── include
│ └── onnxruntime
└── lib
├── cmake
└── pkgconfig
@w11wo You may find the following workflow helpful.
https://github.com/csukuangfj/onnxruntime-libs/actions/runs/5581669504
By default, it builds a static library. If you want to build a shared library, please pass --build_shared_lib
.
See also https://github.com/csukuangfj/onnxruntime-libs/tree/master/.github/workflows
I have been using github actions to build static libraries of onnxruntime.
This is very helpful, @csukuangfj. Many thanks, I'll test it out once I have some time.
Hi @csukuangfj, I was able to reproduce your workflow to create a static library with minimal build. I've also tried to build the shared library version with the addition of --build_shared_lib
, however, I'm confused about how to build the *.dylib
files which are needed for sherpa-onnx. Could you guide me through it or give pointers to do so? Thanks.
Are you able to find a folder with name framework
or xcframework
. If there is a file named onnxruntime
, please use
file onnxruntime
to check whether it is a shared lib or a static lib.
Hi @w11wo, in the end, how small did manage to reduce the size of onnxruntime static lib? (ios-arm64) I try to reduce it to 20M+, it seems to be the limit
Hi everyone, sorry for the late response, got busy with work stuff.
@csukuangfj, yes I was able to find that file, looks like it's a shared lib.
$ file onnxruntime.framework/onnxruntime
onnxruntime.framework/onnxruntime: Mach-O 64-bit dynamically linked shared library x86_64
Now I just need to know how to use this within sherpa-onnx
.
Hi @601222543. I'm getting similar sizes to your findings, about 20MB as you said. You can find my latest ios-arm64 build here. I haven't fully debugged why we can't use it.
Now I just need to know how to use this within sherpa-onnx.
Please create a symlink to it, e.g.,
cd /path/to/onnxruntime.framework
ln -s onnxruntime libonnxruntime.dylib
Then, please set the environment variable
export SHERPA_ONNXRUNTIME_LIB_DIR=/path/to/onnxruntime.framework
Also, please find the directory containing the header files. If it is /tmp/foo/bar
, please set the following environment variable
export SHERPA_ONNXRUNTIME_INCLUDE_DIR=/tmp/foo/bar
Note: Assume there is /tmp/foo/bar/onnxruntime_cxx_api.h
.
After setting the above two environment variables, please remove the build directory of sherpa-onnx
and rebuild it as usual.
I was able to follow the instructions, but it failed since it couldn't find CoreML-related header files.
0: fatal error: 'coreml_provider_factory.h' file not found
#include "coreml_provider_factory.h" // NOLINT
For some reason, this build did not include CoreML build files, even though I've specified to build CoreML as well. This is my build script
python3 \
$DIR/tools/ci_build/build.py \
--build_dir $build_dir \
--include_ops_by_config ops.config \
--disable_ml_ops --disable_exceptions --disable_rtti \
--minimal_build extended \
--config Release \
--update \
--build \
--use_xcode \
--compile_no_warning_as_error \
--build_shared_lib \
--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
--cmake_extra_defines CMAKE_INSTALL_PREFIX=$build_dir/install/ \
--osx_arch x86_64 \
--target install \
--parallel $num_jobs \
--skip_tests \
--build_apple_framework \
--apple_deploy_target 13.0 \
--use_coreml \
--path_to_protoc_exe /usr/local/Cellar/protobuf@21/21.12/bin/protoc-3.21.12.0
Manually copied coreml_provider_factory.h
from the pre-release downloads, and it seems to work!
I made sure that the build was pointing to the pre-built and minimized onnxruntime
, and this is from the logs:
-- location_onnxruntime_header_dir: path/to/ios-onnxruntime/onnxruntime/build-macos/x86_64/Release/Release/onnxruntime.framework/Versions/A/Headers
-- location_onnxruntime_lib: path/to/ios-onnxruntime/onnxruntime/build-macos/x86_64/Release/Release/onnxruntime.framework/libonnxruntime.dylib
and it's size is
$ du -ah onnxruntime
3.6M onnxruntime
It's only 3.6MB compared to the official release which is 23M.
Now the remaining question is why the iOS minimal build didn't work.
EDIT: you can find the macOS x86_64 minimal build here.