onnxruntime_backend Global GPU Memory Limit

Global GPU Memory Limit

Open FabianSchuetze opened this issue 1 year ago • 0 comments

Is there a way to control (limit) the global GPU memory usage of the onnxruntime backend in triton? The tensorflow backend has the following CLI:

--backend-config tensorflow,gpu-memory-fraction=X

I wonder whether there is as a way to control the global allocated GPU memory for the onnx backend? Would the onnxbackend dynamically release unused memory? If so, is there a GPU memory allocation pattern one can expect when doing, for example, first an inference with model A and then with model B?

Nov 25 '22 11:11 FabianSchuetze

onnxruntime_backend onnxruntime_backend copied to clipboard

Global GPU Memory Limit

onnxruntime_backend
onnxruntime_backend copied to clipboard