onnxruntime
onnxruntime copied to clipboard
[Performance] How to reduce gpu memory consumption ?
Describe the issue
I have a onnx model whose size is only 204.57MB,but when I create the session, gpu memory consumpation comes 1.16GB, when inferencing, the gpu memory consumpation comes to 2.25GB, this result in high inference cost, so how to reduce gpu memory consumption ?
To reproduce
just simply create onnxruntime session with default options. the gpu memory consumption function:
def get_gpu_memory_usage():
process = subprocess.Popen(
['nvidia-smi', '--query-gpu=memory.used', '--format=csv,noheader,nounits'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
stdout, stderr = process.communicate()
if stderr:
raise RuntimeError(f"Error fetching GPU memory usage: {stderr.decode()}")
memory_used_mb = int(stdout.decode('utf-8').strip())
memory_used_gb = memory_used_mb / 1024
return round(memory_used_gb, 2)
Urgency
No response
Platform
Linux
OS Version
ubuntu 20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu 1.11.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
11.4
Model File
No response
Is this a quantized model?
No
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.