text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Starcoder2-15B model - AttributeError: 'TensorParallelColumnLinear' object has no attribute [rank3]: 'base_layer'

Open ashwincv0112 opened this issue 10 months ago • 10 comments
trafficstars

System Info

Using the below TGI version: ghcr.io/huggingface/text-generation-inference:3.0.1

Running on AWS g5.12xlarge instance (which is having 4 GPUs)

model used: bigcode/starcoder2-15b-instruct-v0.1

Deployment: Using docker

Information

  • [X] Docker
  • [ ] The CLI directly

Tasks

  • [X] An officially supported command
  • [X] My own modifications

Reproduction

Please be informed that, we are trying to deploy starcoder2-15 instruct model with custom fine-tuned LoRA Adpaters using TGI multi-lora capability. We are using AWS g5.12xlarge instance for this. We have our base model and lora adapters saved in the data directory. We then ran the below docker command.

docker run -it \
  --gpus all \
  --shm-size 1g \
  -v /home/ubuntu/data:/data \
  -p 8080:8080 \
  ghcr.io/huggingface/text-generation-inference:3.0.1 \
	--model-id=/data/starcoder2-15b-instruct-v0.1 \
	--lora-adapters=adapter=/data/starcoder2-15b-lora-adapter \
	--dtype bfloat16 \
	--num-shard 4 

Requirement: Base Model: bigcode/starcoder2-15b-instruct-v0.1 Custom LoRA adapters. AWS g5.12xlarge instance.

On running the above docker command, we are getting the below error:

/opt/conda/lib/python3.11/site-packages/text_generation_server/adapters/lora │
[rank3]: │ .py:209 in prepare_weights                                                   │
[rank3]: │                                                                              │
[rank3]: │   206 │   │   for layer_id in range(nlayers):                                │
[rank3]: │   207 │   │   │   key = (layer_id, layer_type)                               │
[rank3]: │   208 │   │   │   weight_name, layer = target_to_layer[key]                  │
[rank3]: │ ❱ 209 │   │   │   base_weight = layer.base_layer.linear.weight               │
[rank3]: │   210 │   │   │   base_device = base_weight.device                           │
[rank3]: │   211 │   │   │                                                              │
[rank3]: │   212 │   │   │   if weight_name not in module_map:  
config = LoraConfig(                                        │ │
[rank3]: │ │                       │                                                  │ │
[rank3]: │ │                       base_model_name_or_path='bigcode/starcoder2-15b',  │ │
[rank3]: │ │                       │   r=8,                                           │ │
[rank3]: │ │                       │   target_modules={                               │ │
[rank3]: │ │                       │   │   'o_proj',                                  │ │
[rank3]: │ │                       │   │   'up_proj',                                 │ │
[rank3]: │ │                       │   │   'k_proj',                                  │ │
[rank3]: │ │                       │   │   'v_proj',                                  │ │
[rank3]: │ │                       │   │   'gate_proj',                               │ │
[rank3]: │ │                       │   │   'q_proj',                                  │ │
[rank3]: │ │                       │   │   'down_proj'                                │ │
[rank3]: │ │                       │   },                                             │ │
[rank3]: │ │                       │   fan_in_fan_out=False,                          │ │
[rank3]: │ │                       │   lora_alpha=8,                                  │ │
[rank3]: │ │                       │   use_rslora=False                               │ │
[rank3]: │ │                       )                                                  │ │
[rank3]: │ │               dtype = torch.bfloat16                                     │ │
[rank3]: │ │                 key = (0, 'q_proj')                                      │ │
[rank3]: │ │               layer = TensorParallelColumnLinear(                        │ │
[rank3]: │ │                         (linear): FastLinear()                           │ │
[rank3]: │ │                       )                                                  │ │
[rank3]: │ │            layer_id = 0                                                  │ │
[rank3]: │ │          layer_type = 'q_proj'       

Also, one of the observation, in the below file we were able to see Starcoder2-15 instruct model mentiond and was our understanding that the model is supported for the multi-lora functionality using TGI.

https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/models/init.py

Please let us know if there are any gaps in our understanding.

If the Starcoder2-15B model is supported, could you help in resolving the above issue.

Thanks.

Expected behavior

The model should be deployed along with the multi-lora TGI functionality.

ashwincv0112 avatar Jan 06 '25 13:01 ashwincv0112