onnxruntime_backend
onnxruntime_backend copied to clipboard
Add ONNXRuntime extensions support
Hi,
I was wondering if you planned at some point to support the ONNXRuntime extensions detailed on their repo https://github.com/microsoft/onnxruntime-extensions
This will allows/unlock a lot of possibilities such as post or pre-processing directly from Triton model serving when using ONNX models.
How do you plan to use ORT extensions in Triton? What's your use case? cc @wenbingl
For my own case, I would love to be able to have my data preprocessing directly available in Triton, such as tokenization. I can do it locally if I use ORT extensions and it is very convenient as it allows me to do not stick to Python.
For my own case, I would love to be able to have my data preprocessing directly available in Triton, such as tokenization. I can do it locally if I use ORT extensions and it is very convenient as it allows me to do not stick to Python.
onnxruntime-extensions can be built as either a static library or a shared library, and the former requires onnxruntime building from source. Can you share what your preference is?
For me to have a shared library would be more flexible in order to let the users to bring their own ORT extensions.
@jplu as far as I can tell, the code for loading in custom op libraries is already there https://github.com/triton-inference-server/onnxruntime_backend/blob/main/src/onnxruntime.cc#L642
It seems like you need to provide the op_library_filename in the config.pbtxt pointing to the shared libraries in order for the backend to load them
@Jul1aK0wal1k Indeed, the code seems to be here. It would be nice to have an example for how to properly use it with onnxruntime-extensions?
Thanks.
@jplu You'll need to build the onnxruntime-extensions and place the .so somewhere in your Triton Inference Server docker image Example dockerfile (assuming you have a x86/x64 CPU):
ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver
FROM ${BASE_IMAGE}
WORKDIR /opt
ARG CMAKE_BINARY_URL=https://github.com/Kitware/CMake/releases/download/v3.27.0-rc2/cmake-3.27.0-rc2-linux-x86_64.sh
ARG ORTEXTENSIONS_REPO=https://github.com/microsoft/onnxruntime-extensions
ARG ORTEXTENSIONS_BRANCH=v0.7.0
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Warsaw
# Set up cmake
RUN wget ${CMAKE_BINARY_URL} -O cmake.sh && \
chmod +x cmake.sh && \
mkdir /opt/cmake && \
./cmake.sh --skip-license --prefix=/opt/cmake/
# Build ORT Extensions
RUN git clone --branch ${ORTEXTENSIONS_BRANCH} ${ORTEXTENSIONS_REPO} onnx-extensions && \
cd onnx-extensions && \
export PATH=$PATH:/opt/cmake/bin && \
./build.sh
Using this Dockerfile the path to the .so is /opt/onnx-extensions/out/Linux/RelWithDebInfo/lib/libortextensions.so.
Alternatively you could probably use the shared library from the onnxruntime-extensions python package
To make the ORT backend load them you'll need to add the following to your config.pbtxt
model_operations {
op_library_filename: "/opt/ort-customops/onnxruntime-extensions/libortextensions.so"
}
Also make sure the version of onnxruntime-extensions is supported by the version of onnxruntime.
Thanks very much @msyulia. Exactly what I expected :-)
I can safely close the issue now.