blog Deployment of local MultiLoRA model using TGI

Hi Team,

Was trying to deploy a multi-lora adapter model with Starcoder2-3B as base.

Referring to the below blog: https://huggingface.co/blog/multi-lora-serving

Please correct my understanding if I'm am wrong, that the Starcoder2 model is not supported for the multi-lora deployment using TGI. We are getting the below error while deploying.

AttributeError: 'TensorParallelColumnLinear' object has no attribute 'base_layer' rank=0

Also, can you suggest how we can deploy a local model and adapters saved in the local directory using TGI. Every time I try running the below docker command, it is downloading the files from HF.

docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD:/data \
    ghcr.io/huggingface/text-generation-inference:3.0.1 \
    --model-id bigcode/starcoder2-3b \
    --lora-adapters=<local_adapter_path>

Please let me know if any additional information is required.

Thanks, Ashwin.

Dec 27 '24 11:12 ashwincv0112

Any update on this?

Jan 02 '25 14:01 muhammad-asn

Still facing the issue.

Jan 02 '25 14:01 ashwincv0112

Still facing the issue.

I think you should add issue to the https://github.com/huggingface/text-generation-inference repo @ashwincv0112

Jan 02 '25 15:01 muhammad-asn

sure. I will add the issue. So just to confirm my understanding, currently we don't have the capability of deploying the multi-lora logic when the adapters are saved in the local machine?

Jan 02 '25 15:01 ashwincv0112

sure. I will add the issue. So just to confirm my understanding, currently we don't have the capability of deploying the multi-lora logic when the adapters are saved in the local machine?

Yup I'm run into an issue too when using custom based on https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8 model

Jan 02 '25 15:01 muhammad-asn

so right now the only option is to upload the adapters to HuggingFace repo and use the respective model-id to deploy the model... right?

Jan 02 '25 15:01 ashwincv0112

so right now the only option is to upload the adapters to HuggingFace repo and use the respective model-id to deploy the model... right?

Yupp

Jan 02 '25 15:01 muhammad-asn

blog blog copied to clipboard

Deployment of local MultiLoRA model using TGI

blog
blog copied to clipboard