onnxruntime_backend
onnxruntime_backend copied to clipboard
Disable `enable_mem_pattern` not working
Description
Hi all,
recently I encountered an issue when I implemented the onnx model, it would consume too much memory. Please check the issue that it seems a feature of onnxruntime. Hence, we can disable this optimized feature by this (solution).
For example,
session_option = onnxruntime.SessionOptions()
session_option.enable_mem_pattern = False
decoder = onnxruntime.InferenceSession(decoder_path, session_option)
We can set this enable_mem_pattern
parameter to disable it.
So, I checked the README of onnxruntime_backend of Triton repository that I found there is a way to disable this feature on the Triton inference server.
Please check Model Config Options.
According to my understanding, we only need to set it in the model configuration.
For example, here is the config.pbtxt
of my model.
name : "text_model"
platform: "onnxruntime_onnx"
max_batch_size: 0
input [
{
name: "sample"
data_type: TYPE_FP32
dims: [1, 3, -1, -1]
},
{
name: "patches"
data_type: TYPE_FP32
dims: [-1, -1, -1]
}
]
output [
{
name: "density_map"
data_type: TYPE_FP32
dims: [-1, 1, -1, -1]
}
]
instance_group [
{
count: 2
kind: KIND_GPU
}
]
optimization {
graph : {
level : 1
}}
parameters { key: "enable_mem_pattern" value: { string_value: "0" } }
I expect after I add these line codes, it should work. However, after I restart to Triton inference server and deploy it again, it seems not to work that the model still take a lot of GPU memory during inference. The GPU memory utilization is compared with purely Onnxruntime implementation.
Triton Information
Pull the image from NGC, and Version: nvcr.io/nvidia/tritonserver:22.07-py3
I am wondering whether I miss something setting. It will be grateful if there is any suggestions or idea about it. Many thanks in advance!
Env info
My GPU driver version : 470
I have tried the latest GPU version (515) with Triton version nvcr.io/nvidia/tritonserver:22.07-py3
, but it seems not working.
I am not sure whether it will show the configuration in this area if we set it correctly and it works well.