onnxruntime_backend Disable `enable_mem

Disable `enable_mem_pattern` not working

Open chiehpower opened this issue 1 year ago • 1 comments

Description

Hi all,

recently I encountered an issue when I implemented the onnx model, it would consume too much memory. Please check the issue that it seems a feature of onnxruntime. Hence, we can disable this optimized feature by this (solution).

For example,

session_option = onnxruntime.SessionOptions()
session_option.enable_mem_pattern = False
decoder = onnxruntime.InferenceSession(decoder_path, session_option)

We can set this enable_mem_pattern parameter to disable it.

So, I checked the README of onnxruntime_backend of Triton repository that I found there is a way to disable this feature on the Triton inference server.

Please check Model Config Options.

According to my understanding, we only need to set it in the model configuration. For example, here is the config.pbtxt of my model.

name : "text_model" 
platform: "onnxruntime_onnx" 
max_batch_size: 0 
 
input [ 
  { 
    name: "sample" 
    data_type: TYPE_FP32 
    dims: [1, 3, -1, -1] 
  },
  { 
    name: "patches" 
    data_type: TYPE_FP32 
    dims: [-1, -1, -1] 
  } 
] 
 
output [ 
{
    name: "density_map" 
    data_type: TYPE_FP32 
    dims: [-1, 1, -1, -1] 
  }
] 
 
instance_group [ 
  { 
    count: 2 
    kind: KIND_GPU 
  } 
] 

optimization {
  graph : {
    level : 1
}}

parameters { key: "enable_mem_pattern" value: { string_value: "0" } }

I expect after I add these line codes, it should work. However, after I restart to Triton inference server and deploy it again, it seems not to work that the model still take a lot of GPU memory during inference. The GPU memory utilization is compared with purely Onnxruntime implementation.

Triton Information

Pull the image from NGC, and Version: nvcr.io/nvidia/tritonserver:22.07-py3

I am wondering whether I miss something setting. It will be grateful if there is any suggestions or idea about it. Many thanks in advance!

Env info

My GPU driver version : 470

Aug 16 '22 00:08 chiehpower

I have tried the latest GPU version (515) with Triton version nvcr.io/nvidia/tritonserver:22.07-py3, but it seems not working.

I am not sure whether it will show the configuration in this area if we set it correctly and it works well.

Aug 18 '22 09:08 chiehpower

onnxruntime_backend onnxruntime_backend copied to clipboard

Disable `enable_mem_pattern` not working

Env info

onnxruntime_backend
onnxruntime_backend copied to clipboard