Nathan Price issues

Results 17 issues of


                                            Nathan Price

Docker images have repeated layers

**Problem: GKE image streaming will not work with these images due to repeated layers* I would like to use GKE image streaming with triton-inference-server images. This feature will only work...

enhancement

Dynamic scaling not working on RoPe / rotary_scaling

@byshiue can you try to see if dynamic scaling works? linear scaling works fine. if dynamic scaling doesnt work at all, then this is indeed a bug. _Originally posted by...

triaged

Warmup Example of loading LoRa weights

Is warmup supported for the `tensorrtllm_backend`? If so it would be nice to have an example of how to upload LoRa adapters as a warmup step.

triaged

Feature Request: Set maximum number of in flight

When unexpected large bursts in requests come to my application I would like to be able to limit the number of requests that will be accepted by trtllm backend. I...

feature request

Support bfloat16 LoRa Adaptors

I have a Mistral7B model with fine-tuned LoRa weights with datatype bfloat16. I ran into issues when attempting to use my adaptors which were compiled for bfloat16 Running the following...

bug

triaged

Example of LoRa weights

I would like to send Lora weights through to a compiled tensor rt llm model but am unsure how to load the .bin weights and pass them to Triton. An...

triaged

Fixed rslora scaling in lora_manager

Addressing issue mentioned in https://github.com/NVIDIA/TensorRT-LLM/issues/1668 When weights were trained using [rslora scaling ](https://huggingface.co/blog/damjan-k/rslora) they should be scaled differently. Code initially was always normalizing by rank regardless of "use_rslora" flag in...

performance issue

Investigating

Nathan Price

Docker images have repeated layers

Dynamic scaling not working on RoPe / rotary_scaling

Warmup Example of loading LoRa weights

Feature Request: Set maximum number of in flight

Support bfloat16 LoRa Adaptors

Example of LoRa weights

Fixed rslora scaling in lora_manager

Conversion of "hf_lora_convert.py" does not account for "lora_alpha"

Alpha scaling incorrect when using rslora

Model Performance Degraded when using BFLOAT16 LoRa Adapters