onnxruntime_backend icon indicating copy to clipboard operation
onnxruntime_backend copied to clipboard

Add ONNXRuntime extensions support

Open jplu opened this issue 3 years ago • 4 comments
trafficstars

Hi,

I was wondering if you planned at some point to support the ONNXRuntime extensions detailed on their repo https://github.com/microsoft/onnxruntime-extensions

This will allows/unlock a lot of possibilities such as post or pre-processing directly from Triton model serving when using ONNX models.

jplu avatar Oct 21 '22 08:10 jplu

How do you plan to use ORT extensions in Triton? What's your use case? cc @wenbingl

pranavsharma avatar Oct 21 '22 17:10 pranavsharma

For my own case, I would love to be able to have my data preprocessing directly available in Triton, such as tokenization. I can do it locally if I use ORT extensions and it is very convenient as it allows me to do not stick to Python.

jplu avatar Oct 21 '22 17:10 jplu

For my own case, I would love to be able to have my data preprocessing directly available in Triton, such as tokenization. I can do it locally if I use ORT extensions and it is very convenient as it allows me to do not stick to Python.

onnxruntime-extensions can be built as either a static library or a shared library, and the former requires onnxruntime building from source. Can you share what your preference is?

wenbingl avatar Oct 21 '22 18:10 wenbingl

For me to have a shared library would be more flexible in order to let the users to bring their own ORT extensions.

jplu avatar Oct 21 '22 18:10 jplu

@jplu as far as I can tell, the code for loading in custom op libraries is already there https://github.com/triton-inference-server/onnxruntime_backend/blob/main/src/onnxruntime.cc#L642 It seems like you need to provide the op_library_filename in the config.pbtxt pointing to the shared libraries in order for the backend to load them

msyulia avatar Jun 14 '23 13:06 msyulia

@Jul1aK0wal1k Indeed, the code seems to be here. It would be nice to have an example for how to properly use it with onnxruntime-extensions?

Thanks.

jplu avatar Jun 14 '23 14:06 jplu

@jplu You'll need to build the onnxruntime-extensions and place the .so somewhere in your Triton Inference Server docker image Example dockerfile (assuming you have a x86/x64 CPU):


ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver

FROM ${BASE_IMAGE}

WORKDIR /opt

ARG CMAKE_BINARY_URL=https://github.com/Kitware/CMake/releases/download/v3.27.0-rc2/cmake-3.27.0-rc2-linux-x86_64.sh
ARG ORTEXTENSIONS_REPO=https://github.com/microsoft/onnxruntime-extensions
ARG ORTEXTENSIONS_BRANCH=v0.7.0

ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Europe/Warsaw

# Set up cmake
RUN wget ${CMAKE_BINARY_URL} -O cmake.sh && \
    chmod +x cmake.sh && \
    mkdir /opt/cmake && \
    ./cmake.sh --skip-license --prefix=/opt/cmake/ 

# Build ORT Extensions 
RUN git clone --branch ${ORTEXTENSIONS_BRANCH} ${ORTEXTENSIONS_REPO} onnx-extensions && \
    cd onnx-extensions && \
    export PATH=$PATH:/opt/cmake/bin && \
    ./build.sh

Using this Dockerfile the path to the .so is /opt/onnx-extensions/out/Linux/RelWithDebInfo/lib/libortextensions.so. Alternatively you could probably use the shared library from the onnxruntime-extensions python package To make the ORT backend load them you'll need to add the following to your config.pbtxt

model_operations {
  op_library_filename: "/opt/ort-customops/onnxruntime-extensions/libortextensions.so"
}

Also make sure the version of onnxruntime-extensions is supported by the version of onnxruntime.

msyulia avatar Jun 16 '23 13:06 msyulia

Thanks very much @msyulia. Exactly what I expected :-)

I can safely close the issue now.

jplu avatar Jun 16 '23 15:06 jplu