TensorRT
TensorRT copied to clipboard
Deploy DeBERTa to Triton Inference Server
I followed the steps in the DeBERTa guide to create the modified onnx file with the plugin. When I try using this model with triton inference server, it says
Internal: onnx runtime error 9: Could not find an implementation for DisentangledAttention_TRT(1) node with name 'onnx_graphsurgeon_node_0'
Is there a way to get this to work in triton? I'm using triton 24.09.
I can confirm that I was able to get the onnx model working fine when using the onnxruntime package in a python script. It works fine in triton if I don't use the plugin
Slightly separate issue that might be better for another issue: The model in fp16 is garbage, even with the layernorm in fp32.