text-generation-inference
text-generation-inference copied to clipboard
Inconsistent Behavior with Multi-LoRA Deployment
Multi-LoRA Deployment Inconsistency
System Information
- Model: Fine-tuned adapter on unsloth/mistral-7b-instruct-v0.3 (LoRA rank 128)
- Container: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0
- Deployment: AWS SageMaker
Environment
- [x] Docker
- [ ] The CLI directly
Task
- [x] An officially supported command
- [ ] My own modifications
Steps to Reproduce
- Find an adapter on the base model using LoRA with rank 128.
- Deploy the model on AWS SageMaker using the specified container.
Single Deployment Configuration
config = {
"HF_TOKEN": hf_token,
"HF_MODEL_ID": "organization/model-name-redacted",
'SM_NUM_GPUS': json.dumps(number_of_gpu),
'MAX_INPUT_LENGTH': json.dumps(16_000),
'MAX_TOTAL_TOKENS': json.dumps(19_000),
}
Multi-LoRA Deployment Configuration
Following this tutorial:
config = {
"HF_TOKEN": hf_token,
"HF_MODEL_ID": "unsloth/mistral-7b-instruct-v0.3",
"LORA_ADAPTERS": "organization/adapter-1-redacted,organization/adapter-2-redacted,organization/adapter-3-redacted",
'SM_NUM_GPUS': json.dumps(number_of_gpu),
'MAX_INPUT_LENGTH': json.dumps(16_000),
'MAX_TOTAL_TOKENS': json.dumps(19_000),
}
- Invoke the endpoint and verify behavior.
- For multi-LoRA deployment, specify the adapter ID in the request parameters.
Expected Behavior
The multi-LoRA deployment should produce results consistent with the single deployment when the appropriate adapter is specified in the request.
Actual Behavior
The responses differ significantly between single and multi-LoRA deployments.
Single Deployment Response:
[{
"generated_text": '[{
"KEY1": "BOOL",
"KEY2": "DATE",
"KEY3": "CURRENCY",
...
"KEY36": FLOAT
}]'
}]
Multi-LoRA Deployment (Actual) Response:
{
"result": [{
"generated_text": '[
{"name": "KEY1", "value": "STRING"},
{"name": "KEY2", "value": "STRING"},
...
{"name": "KEY46", "value": "BOOL"}
]'
}]
}
Key Differences
- Structure: Single deployment uses a key-value object, while multi-LoRA uses an array of objects with "name" and "value" properties.
- Data types: There are inconsistencies in the assigned data types for several keys.
Please investigate the cause of these inconsistencies in the multi-LoRA deployment. Our team is more than happy to provide any additional information or logs that might be helpful in diagnosing this issue. We're also available for further discussion or testing as needed.
CC: @drbh @danieldk @OlivierDehaene @Narsil