tensorrtllm_backend
tensorrtllm_backend copied to clipboard
Example of LoRa weights
I would like to send Lora weights through to a compiled tensor rt llm model but am unsure how to load the .bin weights and pass them to Triton. An example of using them and passing in weights would be very helpful
Here is example https://github.com/triton-inference-server/tensorrtllm_backend/tree/main/inflight_batcher_llm#running-lora-inference-with-inflight-batching
Thank you for pointing me to this! Things that this helped clear up (and may help someone in the future).
Starting with .safetensors from hugingface you need to convert them to .bin adaptors
import torch
from safetensors.torch import load_file
torch.save(load_file("adapter_model.safetensors"), "adapter_model.bin")`
Then you need to convert that into and .npy format by using the examples/hf_lora_convert.py