Zongfei Jing

Results 15 comments of Zongfei Jing

- question 1: you can refer to [this link](https://nvidia.github.io/TensorRT-LLM/advanced/lora.html#lora-module-id-mapping) for the definition of the Lora modules. `attn_qkv` is a combined qkv adapter. - question 2: since GPU cache is set...

To perform inference with a specific LoRA for the first time, lora_task_id, lora_weights, and lora_config must all be given. The LoRA will be cached, so that subsequent requests for the...

As for the performance degradation you mentioned, I wonder if the result is correct? How much has the performance decreased? Thanks.

In the bug description, I did not see which LoRA was used. could you please tell me ? It's better to offer the huggingface link of the base model and...

Hi, @TheCodeWrangler, Have you solved this issue in the latest version? If not, could you please provide a script to reproduce this issue?