tensorrtllm_backend lora_cache_gpu_memory_fraction is not a good parameter

lora_cache_gpu_memory_fraction is not a good parameter

Open Alireza3242 opened this issue 10 months ago • 1 comments

I want to run the tensorrt_llm program on the server, but I want this execution to be independent of the GPU conditions, the type of GPU, or the amount of free GPU memory. However, the lora_cache_gpu_memory_fraction parameter looks at the available GPU memory and allocates a percentage of it for LoRA. This causes the program execution to depend on the type of GPU or the amount of free GPU memory. Please, if possible, define another parameter that can replace this one, allowing us to specify a fixed amount, such as 1 GB, to be allocated for LoRA. This way, the memory allocation will always be constant and independent of the execution environment.

Dec 22 '24 02:12 Alireza3242

tensorrtllm_backend tensorrtllm_backend copied to clipboard

lora_cache_gpu_memory_fraction is not a good parameter

tensorrtllm_backend
tensorrtllm_backend copied to clipboard