TensorRT Deploy DeBERTa to Triton Inference Server

Deploy DeBERTa to Triton Inference Server

Open nbroad1881 opened this issue 4 months ago • 1 comments

I followed the steps in the DeBERTa guide to create the modified onnx file with the plugin. When I try using this model with triton inference server, it says

Internal: onnx runtime error 9: Could not find an implementation for DisentangledAttention_TRT(1) node with name 'onnx_graphsurgeon_node_0'

Is there a way to get this to work in triton? I'm using triton 24.09.

I can confirm that I was able to get the onnx model working fine when using the onnxruntime package in a python script. It works fine in triton if I don't use the plugin

Slightly separate issue that might be better for another issue: The model in fp16 is garbage, even with the layernorm in fp32.

Oct 16 '24 12:10 nbroad1881

TensorRT TensorRT copied to clipboard

Deploy DeBERTa to Triton Inference Server

TensorRT
TensorRT copied to clipboard