tensorrt_backend
tensorrt_backend copied to clipboard
Support for allocating GPU memory based on the selected profile
The changes in the PR support 2 main items:
- the GPU memory is allocated based on the selected TensorRT profile and not based on the profile that consumes max memory even when it's not selected.
- Avoid the creation of profile 0 execution context if it's required.
For kUSER_MANAGED , the user (in this case the triton server) would need to actually allocate a piece of device memory and pass to execution context. Do you have support for this behavior? If not I would suggest you to only add kSTATIC and kON_PROFILE_CHANGE
@yinggeh: New changes look good to me; I got the expected results on the models with these updates.
Updated README.md
LGTM. Thanks for your contribution.