TensorRT-LLM
                                
                                
                                
                                    TensorRT-LLM copied to clipboard
                            
                            
                            
                        Conversion of "hf_lora_convert.py" does not account for "lora_alpha"
I am seeing degraded performance using lora in my trtllm model and I am suspicious that the "lora_alpha" value in my "adapter_config.json" is not being used when converting weights for the tensorrt_llm inputs
I have been looking through: https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/hf_lora_convert.py#L76
Should I be applying this alpha weight myself before loading the lora_weights to my trtllm model?
I am also concerned that other paramters in the "adapter_config.json" would not be used by tensorrt-llm
"lora_dropout"
for example
The two arguments are only used in training, we don't need them during inference.
@byshiue
I believe that alpha scaling is expected to be performed on the weights which are uploaded.  Digging into the underlying code used by the examples/run.py code I found that scaling is being performed when loading from huggingface.adapters.
https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/lora_manager.py#L632
Should I expect to need to do this when making weights to load manually (using the examples/hf_lora_convert.py script?
Could you take a try to add the scale into examples/hf_lora_convert.py?
@TheCodeWrangler Do you still have the question? If not, we will close it soon.