text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Inconsistent Behavior with Multi-LoRA Deployment

Open charlatan-101 opened this issue 1 year ago • 0 comments

Multi-LoRA Deployment Inconsistency

System Information

  • Model: Fine-tuned adapter on unsloth/mistral-7b-instruct-v0.3 (LoRA rank 128)
  • Container: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0
  • Deployment: AWS SageMaker

Environment

  • [x] Docker
  • [ ] The CLI directly

Task

  • [x] An officially supported command
  • [ ] My own modifications

Steps to Reproduce

  1. Find an adapter on the base model using LoRA with rank 128.
  2. Deploy the model on AWS SageMaker using the specified container.

Single Deployment Configuration

config = {
    "HF_TOKEN": hf_token,
    "HF_MODEL_ID": "organization/model-name-redacted",
    'SM_NUM_GPUS': json.dumps(number_of_gpu),
    'MAX_INPUT_LENGTH': json.dumps(16_000),
    'MAX_TOTAL_TOKENS': json.dumps(19_000),
}

Multi-LoRA Deployment Configuration

Following this tutorial:

config = {
    "HF_TOKEN": hf_token,
    "HF_MODEL_ID": "unsloth/mistral-7b-instruct-v0.3",
    "LORA_ADAPTERS": "organization/adapter-1-redacted,organization/adapter-2-redacted,organization/adapter-3-redacted",
    'SM_NUM_GPUS': json.dumps(number_of_gpu),
    'MAX_INPUT_LENGTH': json.dumps(16_000),
    'MAX_TOTAL_TOKENS': json.dumps(19_000),
}
  1. Invoke the endpoint and verify behavior.
  2. For multi-LoRA deployment, specify the adapter ID in the request parameters.

Expected Behavior

The multi-LoRA deployment should produce results consistent with the single deployment when the appropriate adapter is specified in the request.

Actual Behavior

The responses differ significantly between single and multi-LoRA deployments.

Single Deployment Response:

[{
    "generated_text": '[{
        "KEY1": "BOOL",
        "KEY2": "DATE",
        "KEY3": "CURRENCY",
        ...
        "KEY36": FLOAT
    }]'
}]

Multi-LoRA Deployment (Actual) Response:

{
  "result": [{
    "generated_text": '[
      {"name": "KEY1", "value": "STRING"},
      {"name": "KEY2", "value": "STRING"},
      ...
      {"name": "KEY46", "value": "BOOL"}
    ]'
  }]
}

Key Differences

  1. Structure: Single deployment uses a key-value object, while multi-LoRA uses an array of objects with "name" and "value" properties.
  2. Data types: There are inconsistencies in the assigned data types for several keys.

Please investigate the cause of these inconsistencies in the multi-LoRA deployment. Our team is more than happy to provide any additional information or logs that might be helpful in diagnosing this issue. We're also available for further discussion or testing as needed.

CC: @drbh @danieldk @OlivierDehaene @Narsil

charlatan-101 avatar Sep 24 '24 13:09 charlatan-101