TODO

[ ] Reduce the number of operator kernels when building onnxruntime to reduce the resulting file size of the generated lib

Please see https://onnxruntime.ai/docs/build/custom.html

Feb 20 '23 07:02 csukuangfj

Hello, I'm interested to work on this task, and I've followed the ONNX Runtime docs to try and build a much smaller lib for iOS. Would love to get some feedback on this work.

I've started by forking csukuangfj/ios-onnxruntime and making some changes based on this guide; you can find my fork here. For this purpose, I will use csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26 from which the custom lib will be specialized towards.

As such, I began with cloning the repo to download the ONNX models and use the cloned models and this script to create the reduced ops.config. You can find the bash script here. The resultant ops.config file looks like this

# Generated from ONNX model/s:
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-64.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-64.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-64.int8.onnx
# - /path/to/ios-onnxruntime/onnxruntime/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-64.onnx
ai.onnx;13;Add,Cast,Clip,Concat,Constant,ConstantOfShape,Conv,Div,DynamicQuantizeLinear,Equal,Exp,Expand,Gather,GatherElements,Gemm,GreaterOrEqual,Identity,If,LessOrEqual,Log,MatMul,MatMulInteger,Mul,Neg,Pow,Range,ReduceMean,ReduceSum,Relu,Reshape,Shape,Sigmoid,Slice,Softmax,Squeeze,Sub,Tanh,Tile,Transpose,Unsqueeze,Where

I then specified this ops.config file when building the iOS libraries. Namely, I added these options:

python3 \
  $DIR/tools/ci_build/build.py \
  --build_dir $build_dir \
+ --include_ops_by_config ops.config \
+ --enable_reduced_operator_type_support \
  --config Release \
  --use_xcode \
  --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
  --cmake_extra_defines onnxruntime_BUILD_SHARED_LIB=OFF \
  --cmake_extra_defines CMAKE_INSTALL_PREFIX=$build_dir/install/ \
  --ios \
  --ios_sysroot iphoneos \
  --osx_arch arm64 \
  --target install \
  --parallel $num_jobs \
  --skip_tests \
  --build_apple_framework \
  --apple_deploy_target 13.0 \
  --use_coreml

in these three files: ios-arm64, ios-simulator-arm64, ios-simulator-os64.

And finally ran the script to create onnxruntime.xcframework:

$ tree -h onnxruntime.xcframework
[ 192]  onnxruntime.xcframework
├── [ 224]  Headers
│   ├── [1.5K]  coreml_provider_factory.h
│   ├── [ 397]  cpu_provider_factory.h
│   ├── [170K]  onnxruntime_c_api.h
│   ├── [ 86K]  onnxruntime_cxx_api.h
│   └── [ 67K]  onnxruntime_cxx_inline.h
├── [1.0K]  Info.plist
├── [  96]  ios-arm64
│   └── [ 55M]  onnxruntime.a
└── [  96]  ios-arm64_x86_64-simulator
    └── [113M]  onnxruntime.a

3 directories, 8 files

and successfully created the onnxruntime.xcframework, which you can find here.

Unfortunately, I wasn't able to get this to run on the sample iOS sherpa-onnx app. I think the checkout of onnxruntime I used is too old and needs an upgrade. Nonetheless, I noticed that the resultant onnxruntime.a files are just slightly smaller than that found in the complete builds, hence I'm wondering if I'm missing some additional steps to further reduce the library size. Looking forward to getting this task done, cheers!

EDIT: I've tried updating my onnx and onnxruntime Python versions to 1.14.0 and 1.15.1 respectively, and I'm still unable to run the sample iOS app. For the fp32 model, I'm getting this error

libc++abi: terminating due to uncaught exception of type Ort::Exception: Could not find an implementation for Max(13) node with name '/conv/conv.3/Max'

and for the int8 model, I'm getting

libc++abi: terminating due to uncaught exception of type Ort::Exception: Failed to load model with error: /path/to/ios-onnxruntime/onnxruntime/onnxruntime/core/graph/model_load_utils.h:56 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::string, int> &, const logging::Logger &, bool, const std::string &, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 19 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain com.ms.internal.nhwc is till opset 18.

For the fp32 model, it's unclear why the op isn't included in my custom ops.config even though all the models have been specified during its creation.

EDIT 2: Fixed the latter issue with the int8 model by using a more recent commit of onnxruntime. Both models now have the same former issue.

Jul 17 '23 07:07 w11wo

Nonetheless, I noticed that the resultant onnxruntime.a files are just slightly smaller than that found in the complete builds, hence I'm wondering if I'm missing some additional steps to further reduce the library size

Thanks for the detailed description.

One approach is to convert the model to onnxruntime format and when you invoke https://github.com/microsoft/onnxruntime/blob/main/tools/python/create_reduced_build_config.py you can pass additional arguments to it.

Jul 18 '23 10:07 csukuangfj

Thanks for the pointers, @csukuangfj!

My latest progress is that I'm finally able to build my own custom iOS lib that successfully ran converted models in .ort format, although the lib size is still the same. I am going to try several more flags (e.g. minimal_build) to see how small I can get the library to be. I will keep you posted.

Jul 18 '23 10:07 w11wo

Hi @csukuangfj.

I was able to build a minimal ONNX lib for iOS, and that managed to get the lib to be about half the size of the original full build. However, I am getting an error on during the initialization of the recognizer, and it seems to be happening on the C level, which I'm unable to debug from the Swift side. I am guessing it's probably several unsupported ops, which is a usual issue for mobile/minimal ONNX builds.

If you'd like to give it a shot, I have provided the onnxruntime.xcframework and its build scripts in my fork here. For the models, I was running ORT-converted models here. Cheers.

Jul 20 '23 09:07 w11wo

I was able to build a minimal ONNX lib for iOS

Great!

Are you able to build a version for macOS (x86_64)? That would make the debugging easier as we can use macOS to debug it.

Jul 20 '23 10:07 csukuangfj

Sure, I can try that out in a bit.

Jul 20 '23 10:07 w11wo

@csukuangfj I was able to build ONNX Runtime for macOS from source, but the resultant files aren't like those found in the official releases. Do you happen to know the exact build args? This is what I've been running so far.

python3 \
  $DIR/tools/ci_build/build.py \
  --build_dir $build_dir \
  --include_ops_by_config ops.config \
  --disable_ml_ops --disable_exceptions --disable_rtti \
  --minimal_build extended \
  --config Release \
  --use_xcode \
  --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
  --cmake_extra_defines onnxruntime_BUILD_SHARED_LIB=ON \
  --cmake_extra_defines CMAKE_INSTALL_PREFIX=$build_dir/install/ \
  --osx_arch x86_64 \
  --target install \
  --parallel $num_jobs \
  --skip_tests \
  --build_apple_framework \
  --apple_deploy_target 13.0 \
  --use_coreml \
  --path_to_protoc_exe /usr/local/Cellar/protobuf@21/21.12/bin/protoc-3.21.12.0

and this is what I'm getting

$ tree -L 4 -d build-macos
build-macos
└── x86_64
    ├── Release
    │   ├── CMakeFiles
    │   │   ├── 3.26.4
    │   │   ├── CMakeScratch
    │   │   ├── Export
    │   │   ├── onnxruntime.dir
    │   │   └── pkgRedirects
    │   ├── CMakeScripts
    │   ├── Debug
    │   ├── MinSizeRel
    │   ├── RelWithDebInfo
    │   ├── Release
    │   │   └── onnxruntime.framework
    │   ├── Testing
    │   │   └── Temporary
    │   ├── _deps
    │   │   ├── abseil_cpp-build
    │   │   ├── abseil_cpp-src
    │   │   ├── abseil_cpp-subbuild
    │   │   ├── date-build
    │   │   ├── date-src
    │   │   ├── date-subbuild
    │   │   ├── eigen-build
    │   │   ├── eigen-src
    │   │   ├── eigen-subbuild
    │   │   ├── flatbuffers-build
    │   │   ├── flatbuffers-src
    │   │   ├── flatbuffers-subbuild
    │   │   ├── google_nsync-build
    │   │   ├── google_nsync-src
    │   │   ├── google_nsync-subbuild
    │   │   ├── gsl-build
    │   │   ├── gsl-src
    │   │   ├── gsl-subbuild
    │   │   ├── microsoft_wil-build
    │   │   ├── microsoft_wil-src
    │   │   ├── microsoft_wil-subbuild
    │   │   ├── mp11-build
    │   │   ├── mp11-src
    │   │   ├── mp11-subbuild
    │   │   ├── nlohmann_json-build
    │   │   ├── nlohmann_json-src
    │   │   ├── nlohmann_json-subbuild
    │   │   ├── onnx-build
    │   │   ├── onnx-src
    │   │   ├── onnx-subbuild
    │   │   ├── protobuf-build
    │   │   ├── protobuf-src
    │   │   ├── protobuf-subbuild
    │   │   ├── pytorch_cpuinfo-build
    │   │   ├── pytorch_cpuinfo-src
    │   │   ├── pytorch_cpuinfo-subbuild
    │   │   ├── re2-build
    │   │   ├── re2-src
    │   │   ├── re2-subbuild
    │   │   ├── safeint-build
    │   │   ├── safeint-src
    │   │   └── safeint-subbuild
    │   ├── build
    │   │   ├── EagerLinkingTBDs
    │   │   ├── Release
    │   │   ├── XCBuildData
    │   │   ├── absl_bad_optional_access.build
    │   │   ├── absl_bad_variant_access.build
    │   │   ├── absl_base.build
    │   │   ├── absl_city.build
    │   │   ├── absl_civil_time.build
    │   │   ├── absl_cord.build
    │   │   ├── absl_cord_internal.build
    │   │   ├── absl_cordz_functions.build
    │   │   ├── absl_cordz_handle.build
    │   │   ├── absl_cordz_info.build
    │   │   ├── absl_debugging_internal.build
    │   │   ├── absl_demangle_internal.build
    │   │   ├── absl_exponential_biased.build
    │   │   ├── absl_graphcycles_internal.build
    │   │   ├── absl_hash.build
    │   │   ├── absl_hashtablez_sampler.build
    │   │   ├── absl_int128.build
    │   │   ├── absl_log_severity.build
    │   │   ├── absl_low_level_hash.build
    │   │   ├── absl_malloc_internal.build
    │   │   ├── absl_raw_hash_set.build
    │   │   ├── absl_raw_logging_internal.build
    │   │   ├── absl_spinlock_wait.build
    │   │   ├── absl_stacktrace.build
    │   │   ├── absl_strings.build
    │   │   ├── absl_strings_internal.build
    │   │   ├── absl_symbolize.build
    │   │   ├── absl_synchronization.build
    │   │   ├── absl_throw_delegate.build
    │   │   ├── absl_time.build
    │   │   ├── absl_time_zone.build
    │   │   ├── clog.build
    │   │   ├── cpuinfo.build
    │   │   ├── flatbuffers.build
    │   │   ├── flatc.build
    │   │   ├── libprotobuf-lite.build
    │   │   ├── nsync_cpp.build
    │   │   ├── onnx.build
    │   │   ├── onnx_proto.build
    │   │   ├── onnxruntime.build
    │   │   ├── onnxruntime_common.build
    │   │   ├── onnxruntime_coreml_proto.build
    │   │   ├── onnxruntime_flatbuffers.build
    │   │   ├── onnxruntime_framework.build
    │   │   ├── onnxruntime_graph.build
    │   │   ├── onnxruntime_mlas.build
    │   │   ├── onnxruntime_optimizer.build
    │   │   ├── onnxruntime_providers.build
    │   │   ├── onnxruntime_providers_coreml.build
    │   │   ├── onnxruntime_session.build
    │   │   ├── onnxruntime_util.build
    │   │   └── re2.build
    │   ├── coreml
    │   ├── onnx
    │   ├── onnxruntime.xcodeproj
    │   │   └── project.xcworkspace
    │   ├── op_reduction.generated
    │   │   ├── onnxruntime
    │   │   └── orttraining
    │   ├── static_framework
    │   │   └── onnxruntime.framework
    │   └── static_libraries
    └── install
        ├── bin
        │   └── onnxruntime.framework
        ├── include
        │   └── onnxruntime
        └── lib
            ├── cmake
            └── pkgconfig

Jul 25 '23 08:07 w11wo

@w11wo You may find the following workflow helpful.

https://github.com/csukuangfj/onnxruntime-libs/actions/runs/5581669504

By default, it builds a static library. If you want to build a shared library, please pass --build_shared_lib.

See also https://github.com/csukuangfj/onnxruntime-libs/tree/master/.github/workflows

I have been using github actions to build static libraries of onnxruntime.

Jul 26 '23 09:07 csukuangfj

This is very helpful, @csukuangfj. Many thanks, I'll test it out once I have some time.

Jul 26 '23 09:07 w11wo

Hi @csukuangfj, I was able to reproduce your workflow to create a static library with minimal build. I've also tried to build the shared library version with the addition of --build_shared_lib, however, I'm confused about how to build the *.dylib files which are needed for sherpa-onnx. Could you guide me through it or give pointers to do so? Thanks.

Jul 28 '23 10:07 w11wo

Are you able to find a folder with name framework or xcframework. If there is a file named onnxruntime, please use

file onnxruntime

to check whether it is a shared lib or a static lib.

Jul 29 '23 14:07 csukuangfj

Hi @w11wo, in the end, how small did manage to reduce the size of onnxruntime static lib? (ios-arm64) I try to reduce it to 20M+, it seems to be the limit

Aug 04 '23 09:08 yushanyong

Hi everyone, sorry for the late response, got busy with work stuff.

@csukuangfj, yes I was able to find that file, looks like it's a shared lib.

$ file onnxruntime.framework/onnxruntime 
onnxruntime.framework/onnxruntime: Mach-O 64-bit dynamically linked shared library x86_64

Now I just need to know how to use this within sherpa-onnx.

Hi @601222543. I'm getting similar sizes to your findings, about 20MB as you said. You can find my latest ios-arm64 build here. I haven't fully debugged why we can't use it.

Aug 04 '23 09:08 w11wo

Now I just need to know how to use this within sherpa-onnx.

Please create a symlink to it, e.g.,

cd /path/to/onnxruntime.framework
ln -s onnxruntime libonnxruntime.dylib

Then, please set the environment variable

export SHERPA_ONNXRUNTIME_LIB_DIR=/path/to/onnxruntime.framework

Also, please find the directory containing the header files. If it is /tmp/foo/bar, please set the following environment variable

export SHERPA_ONNXRUNTIME_INCLUDE_DIR=/tmp/foo/bar

Note: Assume there is /tmp/foo/bar/onnxruntime_cxx_api.h.

After setting the above two environment variables, please remove the build directory of sherpa-onnx and rebuild it as usual.

Aug 04 '23 10:08 csukuangfj

I was able to follow the instructions, but it failed since it couldn't find CoreML-related header files.

0: fatal error: 'coreml_provider_factory.h' file not found
#include "coreml_provider_factory.h"  // NOLINT

For some reason, this build did not include CoreML build files, even though I've specified to build CoreML as well. This is my build script

python3 \
  $DIR/tools/ci_build/build.py \
  --build_dir $build_dir \
  --include_ops_by_config ops.config \
  --disable_ml_ops --disable_exceptions --disable_rtti \
  --minimal_build extended \
  --config Release \
  --update \
  --build \
  --use_xcode \
  --compile_no_warning_as_error \
  --build_shared_lib \
  --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
  --cmake_extra_defines CMAKE_INSTALL_PREFIX=$build_dir/install/ \
  --osx_arch x86_64 \
  --target install \
  --parallel $num_jobs \
  --skip_tests \
  --build_apple_framework \
  --apple_deploy_target 13.0 \
  --use_coreml \
  --path_to_protoc_exe /usr/local/Cellar/protobuf@21/21.12/bin/protoc-3.21.12.0

Aug 04 '23 11:08 w11wo

Manually copied coreml_provider_factory.h from the pre-release downloads, and it seems to work!

I made sure that the build was pointing to the pre-built and minimized onnxruntime, and this is from the logs:

-- location_onnxruntime_header_dir: path/to/ios-onnxruntime/onnxruntime/build-macos/x86_64/Release/Release/onnxruntime.framework/Versions/A/Headers
-- location_onnxruntime_lib: path/to/ios-onnxruntime/onnxruntime/build-macos/x86_64/Release/Release/onnxruntime.framework/libonnxruntime.dylib

and it's size is

$ du -ah onnxruntime
3.6M	onnxruntime

It's only 3.6MB compared to the official release which is 23M.

Now the remaining question is why the iOS minimal build didn't work.

EDIT: you can find the macOS x86_64 minimal build here.

Aug 04 '23 11:08 w11wo

sherpa-onnx
sherpa-onnx copied to clipboard

[Help wanted] Customize onnxruntime

TODO

sherpa-onnx sherpa-onnx copied to clipboard

[Help wanted] Customize onnxruntime

TODO

sherpa-onnx
sherpa-onnx copied to clipboard