tensorrt_backend icon indicating copy to clipboard operation
tensorrt_backend copied to clipboard

Support for allocating GPU memory based on the selected profile

Open anmolgupt opened this issue 8 months ago • 1 comments

The changes in the PR support 2 main items:

  1. the GPU memory is allocated based on the selected TensorRT profile and not based on the profile that consumes max memory even when it's not selected.
  2. Avoid the creation of profile 0 execution context if it's required.

anmolgupt avatar Apr 02 '25 04:04 anmolgupt

For kUSER_MANAGED , the user (in this case the triton server) would need to actually allocate a piece of device memory and pass to execution context. Do you have support for this behavior? If not I would suggest you to only add kSTATIC and kON_PROFILE_CHANGE

dongfengy avatar Apr 02 '25 20:04 dongfengy

@yinggeh: New changes look good to me; I got the expected results on the models with these updates.

anmolgupt avatar Apr 17 '25 22:04 anmolgupt

Updated README.md

yinggeh avatar Apr 18 '25 01:04 yinggeh

LGTM. Thanks for your contribution.

yinggeh avatar Apr 18 '25 19:04 yinggeh