blog
blog copied to clipboard
Deployment of local MultiLoRA model using TGI
Hi Team,
Was trying to deploy a multi-lora adapter model with Starcoder2-3B as base.
Referring to the below blog: https://huggingface.co/blog/multi-lora-serving
Please correct my understanding if I'm am wrong, that the Starcoder2 model is not supported for the multi-lora deployment using TGI. We are getting the below error while deploying.
AttributeError: 'TensorParallelColumnLinear' object has no attribute 'base_layer' rank=0
Also, can you suggest how we can deploy a local model and adapters saved in the local directory using TGI. Every time I try running the below docker command, it is downloading the files from HF.
docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD:/data \
ghcr.io/huggingface/text-generation-inference:3.0.1 \
--model-id bigcode/starcoder2-3b \
--lora-adapters=<local_adapter_path>
Please let me know if any additional information is required.
Thanks, Ashwin.
Any update on this?
Still facing the issue.
Still facing the issue.
I think you should add issue to the https://github.com/huggingface/text-generation-inference repo @ashwincv0112
sure. I will add the issue. So just to confirm my understanding, currently we don't have the capability of deploying the multi-lora logic when the adapters are saved in the local machine?
sure. I will add the issue. So just to confirm my understanding, currently we don't have the capability of deploying the multi-lora logic when the adapters are saved in the local machine?
Yup I'm run into an issue too when using custom based on https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8 model
so right now the only option is to upload the adapters to HuggingFace repo and use the respective model-id to deploy the model... right?
so right now the only option is to upload the adapters to HuggingFace repo and use the respective model-id to deploy the model... right?
Yupp