onnxruntime_backend issues

GPU memory leak with high load for ONNX model

3

**Description** GPU memory leak with high load, GPU memory usage goes up and never come down when high load requests stop coming (memory never released) **Triton Information** What version of...

junwang-wish

Add `enable_dynamic_shapes` To Model Config To Resolve CNN Memory Leaks With OpenVino EP

**Is your feature request related to a problem? Please describe.** For some models (most notably, CNNs), OpenVino EP for ONNX Runtime produces significant memory leaks due to issues with memory...

narolski

How to create onnx model for ragged batching?

I prepeared simple example. I created simple summing model which have input and length in shape = [-1] ``` import torch import torch.nn as nn class SummingModel(nn.Module): def forward(self, input,...

Sitcebelly

InvalidArgumentError: The tensor Input (Input) of Slice op is not initialized.

I used the re model in paddleocr for inference, and the model can be loaded normally, but the inference report this error, what is the reason? Can you give me...

qiu-pinggaizi

Can I build the Onnxruntime backend for Windows without Docker??

I have a triton server running in Linux but we need to deploy it in Windows. Thanks.

victorsoyvictor

Update onnxruntime to 1.14.0 or 1.14.1 to fix TensorRT issue

**Is your feature request related to a problem? Please describe.** We are currently unable to (properly) use TensorRT in Triton because there is [a bug](https://github.com/microsoft/onnxruntime/issues/14269) in the onnxruntime < 1.14.0...

OvervCW

Add option to enable CUDA Graphs in CUDA EP

ONNXRuntime has added support in "preview mode" for [CUDA Graphs](https://onnxruntime.ai/docs/performance/tune-performance.html#using-cuda-graphs-in-the-cuda-ep) in the CUDA Execution Provider. It would be useful to expose this option for the onnx runtime Triton backend as...

nealvaidya

Expose `session.use_device_allocator_for_initializers` in onnxruntime_backend to completely shrink arena

**Description** `memory.enable_memory_arena_shrinkage` in ONNXRuntime backend does **not** release all arena after each run. After some research, I found `session.use_device_allocator_for_initializers` might need to be `1` in order to make arena shrinkage...

zeruniverse

Possible to enable dynamic batch dimension only on one some input tensors?

Context: I created a PyTorch model (nn.module) and exported to *.onnx using torch.onnx.export() function with dynamic batch dimension enabled on only some of the input tensors. The output tensor shape...

kgu3

Request statistics reported incorrectly

**Description** Incorrect values are reported as request durations when multiple requests are batched together. **Triton Information** What version of Triton are you using? 22.12 Are you using the Triton container...

vkatms

onnxruntime_backend
onnxruntime_backend copied to clipboard

Metadata

GPU memory leak with high load for ONNX model

Add `enable_dynamic_shapes` To Model Config To Resolve CNN Memory Leaks With OpenVino EP

How to create onnx model for ragged batching?

InvalidArgumentError: The tensor Input (Input) of Slice op is not initialized.

Can I build the Onnxruntime backend for Windows without Docker??

Update onnxruntime to 1.14.0 or 1.14.1 to fix TensorRT issue

Add option to enable CUDA Graphs in CUDA EP

Expose `session.use_device_allocator_for_initializers` in onnxruntime_backend to completely shrink arena

Possible to enable dynamic batch dimension only on one some input tensors?

Request statistics reported incorrectly

← Metadata

Owner

Metadata

onnxruntime_backend onnxruntime_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

onnxruntime_backend
onnxruntime_backend copied to clipboard