onnxruntime_backend
onnxruntime_backend copied to clipboard
Memory Leak When Using ONNXRuntime With OpenVino EP
Description
Using the same model as in #102, the Triton Inference Server has a memory leak, as observed by docker stats
, after adding:
execution_accelerators {
cpu_execution_accelerator : [ {
name : "openvino"
} ]
}
to model config.
Without the openvino
EP usage, there is no memory leak
Triton Information
What version of Triton are you using?
openvino==2022.1.0
with triton-onnxbackend==22.06
and onnxruntime==1.11.1
.
Are you using the Triton container or did you build it yourself?
Custom container build.
To Reproduce
See #102 for model.
Expected behavior A clear and concise description of what you expected to happen.
Provision of model configuration flags (like in #102) that will customize the memory handling of OpenVino EP.
After further investigation of this issue I've determined that there was a memory reusage solution implemented for OpenVino EP: https://github.com/openvinotoolkit/openvino/pull/11667.
I will try to build the OpenVino master branch with the changes from the above PR to see if it resolves this issue.
Update:
- Building the OpenVino with changes from https://github.com/openvinotoolkit/openvino/pull/11667 did not solve the issue for my model.
- I've also reported the bug to the OpenVino team https://github.com/openvinotoolkit/openvino/issues/12307
There is another PR to solve rnn cache increasing issue, could help to try it? https://github.com/openvinotoolkit/openvino/pull/12053
I have a same problem on CRAFT model, even I converted CRAFT to openvino IR format, does this to be fix?
You can try the openvino 2022.2 or latest master branch.
@narolski Has this problem been solved?