vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Feature]: Expose Lora lineage information from /v1/models

Open Jeffwan opened this issue 1 year ago • 0 comments

🚀 The feature, motivation and pitch

python -m vllm.entrypoints.openai.api_server \
    --model /workspace/meta-llama/Llama-2-7b-hf \
    --enable-lora \
    --lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/

The /v1/models response from above setup can not expose the lineage between lora and base models. In below example, root always points to the base_model.

Current Status

  1. Base model will either use --model or --served-model-name. If user use local path, then the id and root would not be model id like OpenAI.

  2. Lora model card information is from LoraRequest which doesn't have base_model at this moment. Technically, we can assume they are all adapters to base model. This may break later once the engine supports multiple models.

{
  "object": "list",
  "data": [
    {
      "id": "/workspace/meta-llama/Llama-2-7b-hf",
      "object": "model",
      "created": 1715644056,
      "owned_by": "vllm",
      "root": "/workspace/meta-llama/Llama-2-7b-hf",
      "parent": null,
      "permission": [
        {
          .....
        }
      ]
    },
    {
      "id": "sql-lora",
      "object": "model",
      "created": 1715644056,
      "owned_by": "vllm",
      "root": "/workspace/meta-llama/Llama-2-7b-hf",
      "parent": null,
      "permission": [
        {
          ....
        }
      ]
    }
  ]
}

Expected

We can use root to represent model path and parent to indicate base_model for lora adapters. seems they are not OpenAI protocols, we should be able to make the change

{
  "object": "list",
  "data": [
    {
      "id": "meta-llama/Llama-2-7b-hf",
      "object": "model",
      "created": 1715644056,
      "owned_by": "vllm",
      "root": "~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/",
      "parent": null,
      "permission": [
        {
          .....
        }
      ]
    },
    {
      "id": "sql-lora",
      "object": "model",
      "created": 1715644056,
      "owned_by": "vllm",
      "root": "~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/snapshots/0dfa347e8877a4d4ed19ee56c140fa518470028c/",
      "parent": meta-llama/Llama-2-7b-hf,
      "permission": [
        {
          ....
        }
      ]
    }
  ]
}

I am drafting a PR to address this issue and please help review whether above looks good.

Alternatives

No response

Additional context

No response

Jeffwan avatar Jul 10 '24 00:07 Jeffwan