text-generation-inference
text-generation-inference copied to clipboard
Starcoder2-15B model - AttributeError: 'TensorParallelColumnLinear' object has no attribute [rank3]: 'base_layer'
System Info
Using the below TGI version: ghcr.io/huggingface/text-generation-inference:3.0.1
Running on AWS g5.12xlarge instance (which is having 4 GPUs)
model used: bigcode/starcoder2-15b-instruct-v0.1
Deployment: Using docker
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [X] My own modifications
Reproduction
Please be informed that, we are trying to deploy starcoder2-15 instruct model with custom fine-tuned LoRA Adpaters using TGI multi-lora capability. We are using AWS g5.12xlarge instance for this. We have our base model and lora adapters saved in the data directory. We then ran the below docker command.
docker run -it \
--gpus all \
--shm-size 1g \
-v /home/ubuntu/data:/data \
-p 8080:8080 \
ghcr.io/huggingface/text-generation-inference:3.0.1 \
--model-id=/data/starcoder2-15b-instruct-v0.1 \
--lora-adapters=adapter=/data/starcoder2-15b-lora-adapter \
--dtype bfloat16 \
--num-shard 4
Requirement: Base Model: bigcode/starcoder2-15b-instruct-v0.1 Custom LoRA adapters. AWS g5.12xlarge instance.
On running the above docker command, we are getting the below error:
/opt/conda/lib/python3.11/site-packages/text_generation_server/adapters/lora │
[rank3]: │ .py:209 in prepare_weights │
[rank3]: │ │
[rank3]: │ 206 │ │ for layer_id in range(nlayers): │
[rank3]: │ 207 │ │ │ key = (layer_id, layer_type) │
[rank3]: │ 208 │ │ │ weight_name, layer = target_to_layer[key] │
[rank3]: │ ❱ 209 │ │ │ base_weight = layer.base_layer.linear.weight │
[rank3]: │ 210 │ │ │ base_device = base_weight.device │
[rank3]: │ 211 │ │ │ │
[rank3]: │ 212 │ │ │ if weight_name not in module_map:
config = LoraConfig( │ │
[rank3]: │ │ │ │ │
[rank3]: │ │ base_model_name_or_path='bigcode/starcoder2-15b', │ │
[rank3]: │ │ │ r=8, │ │
[rank3]: │ │ │ target_modules={ │ │
[rank3]: │ │ │ │ 'o_proj', │ │
[rank3]: │ │ │ │ 'up_proj', │ │
[rank3]: │ │ │ │ 'k_proj', │ │
[rank3]: │ │ │ │ 'v_proj', │ │
[rank3]: │ │ │ │ 'gate_proj', │ │
[rank3]: │ │ │ │ 'q_proj', │ │
[rank3]: │ │ │ │ 'down_proj' │ │
[rank3]: │ │ │ }, │ │
[rank3]: │ │ │ fan_in_fan_out=False, │ │
[rank3]: │ │ │ lora_alpha=8, │ │
[rank3]: │ │ │ use_rslora=False │ │
[rank3]: │ │ ) │ │
[rank3]: │ │ dtype = torch.bfloat16 │ │
[rank3]: │ │ key = (0, 'q_proj') │ │
[rank3]: │ │ layer = TensorParallelColumnLinear( │ │
[rank3]: │ │ (linear): FastLinear() │ │
[rank3]: │ │ ) │ │
[rank3]: │ │ layer_id = 0 │ │
[rank3]: │ │ layer_type = 'q_proj'
Also, one of the observation, in the below file we were able to see Starcoder2-15 instruct model mentiond and was our understanding that the model is supported for the multi-lora functionality using TGI.
https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/models/init.py
Please let us know if there are any gaps in our understanding.
If the Starcoder2-15B model is supported, could you help in resolving the above issue.
Thanks.
Expected behavior
The model should be deployed along with the multi-lora TGI functionality.