TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Conversion of "hf_lora_convert.py" does not account for "lora_alpha"

Open TheCodeWrangler opened this issue 1 year ago • 4 comments

I am seeing degraded performance using lora in my trtllm model and I am suspicious that the "lora_alpha" value in my "adapter_config.json" is not being used when converting weights for the tensorrt_llm inputs

I have been looking through: https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/hf_lora_convert.py#L76

Should I be applying this alpha weight myself before loading the lora_weights to my trtllm model?

TheCodeWrangler avatar Apr 19 '24 16:04 TheCodeWrangler

I am also concerned that other paramters in the "adapter_config.json" would not be used by tensorrt-llm

"lora_dropout"

for example

TheCodeWrangler avatar Apr 19 '24 19:04 TheCodeWrangler

The two arguments are only used in training, we don't need them during inference.

byshiue avatar Apr 22 '24 07:04 byshiue

@byshiue

I believe that alpha scaling is expected to be performed on the weights which are uploaded. Digging into the underlying code used by the examples/run.py code I found that scaling is being performed when loading from huggingface.adapters.

https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/lora_manager.py#L632

Should I expect to need to do this when making weights to load manually (using the examples/hf_lora_convert.py script?

TheCodeWrangler avatar May 24 '24 20:05 TheCodeWrangler

Could you take a try to add the scale into examples/hf_lora_convert.py?

byshiue avatar May 27 '24 08:05 byshiue

@TheCodeWrangler Do you still have the question? If not, we will close it soon.

hello-11 avatar Nov 14 '24 08:11 hello-11