Nathan Price
Nathan Price
**Problem: GKE image streaming will not work with these images due to repeated layers* I would like to use GKE image streaming with triton-inference-server images. This feature will only work...
@byshiue can you try to see if dynamic scaling works? linear scaling works fine. if dynamic scaling doesnt work at all, then this is indeed a bug. _Originally posted by...
Is warmup supported for the `tensorrtllm_backend`? If so it would be nice to have an example of how to upload LoRa adapters as a warmup step.
When unexpected large bursts in requests come to my application I would like to be able to limit the number of requests that will be accepted by trtllm backend. I...
I have a Mistral7B model with fine-tuned LoRa weights with datatype bfloat16. I ran into issues when attempting to use my adaptors which were compiled for bfloat16 Running the following...
I would like to send Lora weights through to a compiled tensor rt llm model but am unsure how to load the .bin weights and pass them to Triton. An...
Addressing issue mentioned in https://github.com/NVIDIA/TensorRT-LLM/issues/1668 When weights were trained using [rslora scaling ](https://huggingface.co/blog/damjan-k/rslora) they should be scaled differently. Code initially was always normalizing by rank regardless of "use_rslora" flag in...
I am seeing degraded performance using lora in my trtllm model and I am suspicious that the "lora_alpha" value in my "adapter_config.json" is not being used when converting weights for...
### System Info Any ### Who can help? @kaiyux ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X] An officially...
### System Info 2X L4 GPUs Docker Image: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 ### Who can help? @juney-nvidia @kaiyux ### Information - [ ] The official example scripts - [X] My own modified scripts...