onnxruntime_backend
onnxruntime_backend copied to clipboard
The Triton backend for the ONNX Runtime.
**Description** GPU memory leak with high load, GPU memory usage goes up and never come down when high load requests stop coming (memory never released) **Triton Information** What version of...
**Is your feature request related to a problem? Please describe.** For some models (most notably, CNNs), OpenVino EP for ONNX Runtime produces significant memory leaks due to issues with memory...
I prepeared simple example. I created simple summing model which have input and length in shape = [-1] ``` import torch import torch.nn as nn class SummingModel(nn.Module): def forward(self, input,...
I used the re model in paddleocr for inference, and the model can be loaded normally, but the inference report this error, what is the reason? Can you give me...
I have a triton server running in Linux but we need to deploy it in Windows. Thanks.
**Is your feature request related to a problem? Please describe.** We are currently unable to (properly) use TensorRT in Triton because there is [a bug](https://github.com/microsoft/onnxruntime/issues/14269) in the onnxruntime < 1.14.0...
ONNXRuntime has added support in "preview mode" for [CUDA Graphs](https://onnxruntime.ai/docs/performance/tune-performance.html#using-cuda-graphs-in-the-cuda-ep) in the CUDA Execution Provider. It would be useful to expose this option for the onnx runtime Triton backend as...
**Description** `memory.enable_memory_arena_shrinkage` in ONNXRuntime backend does **not** release all arena after each run. After some research, I found `session.use_device_allocator_for_initializers` might need to be `1` in order to make arena shrinkage...
Context: I created a PyTorch model (nn.module) and exported to *.onnx using torch.onnx.export() function with dynamic batch dimension enabled on only some of the input tensors. The output tensor shape...
**Description** Incorrect values are reported as request durations when multiple requests are batched together. **Triton Information** What version of Triton are you using? 22.12 Are you using the Triton container...