tensorrt_backend Support for allocating GPU memory based on the selected profile

Support for allocating GPU memory based on the selected profile

Open anmolgupt opened this issue 8 months ago • 1 comments

The changes in the PR support 2 main items:

the GPU memory is allocated based on the selected TensorRT profile and not based on the profile that consumes max memory even when it's not selected.
Avoid the creation of profile 0 execution context if it's required.

Apr 02 '25 04:04 anmolgupt

For kUSER_MANAGED , the user (in this case the triton server) would need to actually allocate a piece of device memory and pass to execution context. Do you have support for this behavior? If not I would suggest you to only add kSTATIC and kON_PROFILE_CHANGE

Apr 02 '25 20:04 dongfengy

@yinggeh: New changes look good to me; I got the expected results on the models with these updates.

Apr 17 '25 22:04 anmolgupt

Updated README.md

Apr 18 '25 01:04 yinggeh

LGTM. Thanks for your contribution.

Apr 18 '25 19:04 yinggeh

tensorrt_backend tensorrt_backend copied to clipboard

Support for allocating GPU memory based on the selected profile

tensorrt_backend
tensorrt_backend copied to clipboard