[Feature]: Support loading lora adapters from HuggingFace in runtime

Open Jeffwan opened this issue 1 year ago • 0 comments

🚀 The feature, motivation and pitch

Background

Based on the lora documentation here, user has to specific the local lora path when they starts the engine. This introduces operation overhead and we want to improve the lora experience to the same level as base model. From the UX perspective, user should be able to pass in either remote lora models and local lora models. If it's a remote path, engine should be able to download it in runtime and then serve the request.

Workflow

Starts with lora model yard1/llama-2-7b-sql-lora-test

python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-2-7b-hf \
    --enable-lora \
    --lora-modules sql-lora=yard1/llama-2-7b-sql-lora-test

Current results

Expected results

Lora should be downloaded and be loaded by engine.

Proposed changes

implement get_lora_absolute_path. it should hide the lora location complexity, if that's relative path, it should resolve and get an absolute path. If that's remote path, it should download the artifacts via from huggingface_hub import snapshot_download and get the snapshot path.
Update the workflow here https://github.com/vllm-project/vllm/blob/4f0e0ea131ef40654faa26fa21196031754df53a/vllm/lora/worker_manager.py#L174-L175 to leverage the get_lora_absolute_path before it tries to load the local path
Rename lora_local_path to lora_path to indicate it supports both local and remote path now.

Future work

Support other remote storage like S3 in future. This is not in the current scope.

Alternatives

No response

Additional context

Related issue https://github.com/vllm-project/vllm/issues/6231. relative path doesn't work

Jul 08 '24 23:07 Jeffwan