onnxruntime_backend icon indicating copy to clipboard operation
onnxruntime_backend copied to clipboard

The Triton backend for the ONNX Runtime.

Results 81 onnxruntime_backend issues
Sort by recently updated
recently updated
newest added

**Description** GPU memory leak with high load, GPU memory usage goes up and never come down when high load requests stop coming (memory never released) **Triton Information** What version of...

**Is your feature request related to a problem? Please describe.** For some models (most notably, CNNs), OpenVino EP for ONNX Runtime produces significant memory leaks due to issues with memory...

I prepeared simple example. I created simple summing model which have input and length in shape = [-1] ``` import torch import torch.nn as nn class SummingModel(nn.Module): def forward(self, input,...

I used the re model in paddleocr for inference, and the model can be loaded normally, but the inference report this error, what is the reason? Can you give me...

I have a triton server running in Linux but we need to deploy it in Windows. Thanks.

**Is your feature request related to a problem? Please describe.** We are currently unable to (properly) use TensorRT in Triton because there is [a bug](https://github.com/microsoft/onnxruntime/issues/14269) in the onnxruntime < 1.14.0...

ONNXRuntime has added support in "preview mode" for [CUDA Graphs](https://onnxruntime.ai/docs/performance/tune-performance.html#using-cuda-graphs-in-the-cuda-ep) in the CUDA Execution Provider. It would be useful to expose this option for the onnx runtime Triton backend as...

**Description** `memory.enable_memory_arena_shrinkage` in ONNXRuntime backend does **not** release all arena after each run. After some research, I found `session.use_device_allocator_for_initializers` might need to be `1` in order to make arena shrinkage...

Context: I created a PyTorch model (nn.module) and exported to *.onnx using torch.onnx.export() function with dynamic batch dimension enabled on only some of the input tensors. The output tensor shape...

**Description** Incorrect values are reported as request durations when multiple requests are batched together. **Triton Information** What version of Triton are you using? 22.12 Are you using the Triton container...