tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

Example of LoRa weights

Open TheCodeWrangler opened this issue 1 year ago • 2 comments
trafficstars

I would like to send Lora weights through to a compiled tensor rt llm model but am unsure how to load the .bin weights and pass them to Triton. An example of using them and passing in weights would be very helpful

TheCodeWrangler avatar Apr 09 '24 21:04 TheCodeWrangler

Here is example https://github.com/triton-inference-server/tensorrtllm_backend/tree/main/inflight_batcher_llm#running-lora-inference-with-inflight-batching

byshiue avatar Apr 10 '24 08:04 byshiue

Thank you for pointing me to this! Things that this helped clear up (and may help someone in the future).

Starting with .safetensors from hugingface you need to convert them to .bin adaptors

import torch
from safetensors.torch import load_file

torch.save(load_file("adapter_model.safetensors"), "adapter_model.bin")`

Then you need to convert that into and .npy format by using the examples/hf_lora_convert.py

TheCodeWrangler avatar Apr 10 '24 13:04 TheCodeWrangler