tensorrtllm_backend Example of LoRa weights

Example of LoRa weights

Open TheCodeWrangler opened this issue 1 year ago • 2 comments

trafficstars

I would like to send Lora weights through to a compiled tensor rt llm model but am unsure how to load the .bin weights and pass them to Triton. An example of using them and passing in weights would be very helpful

Apr 09 '24 21:04 TheCodeWrangler

Here is example https://github.com/triton-inference-server/tensorrtllm_backend/tree/main/inflight_batcher_llm#running-lora-inference-with-inflight-batching

Apr 10 '24 08:04 byshiue

Thank you for pointing me to this! Things that this helped clear up (and may help someone in the future).

Starting with .safetensors from hugingface you need to convert them to .bin adaptors

import torch
from safetensors.torch import load_file

torch.save(load_file("adapter_model.safetensors"), "adapter_model.bin")`

Then you need to convert that into and .npy format by using the examples/hf_lora_convert.py

Apr 10 '24 13:04 TheCodeWrangler

tensorrtllm_backend tensorrtllm_backend copied to clipboard

Example of LoRa weights

tensorrtllm_backend
tensorrtllm_backend copied to clipboard