vllm
vllm copied to clipboard
[Frontend] support new lora module to a live server in OpenAI Entrypoints
Previous version of OpenAI entrypoints didn't support adding lora adapter to a live server. Now you can use my version to add an adapter path by some command:
curl -X GET your_host:your_port/add_lora \
-H "Content-Type: application/json" \
-d '{
"lora_name": "your_new_lora_model_name",
"lora_local_path": "your_new_lora_model_path"
}'
After adding the model, you can use these model like another exist loras.
We will review #3308 as it also have a delete API for completeness.
Could you please tell me when the online service feature for adding LoRA weights will be officially integrated?
We had a different version with more test and better coverage (chat/completion/embedding). Let me rebase to master and share with the community. @TangJiakai @simon-mo
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!
This pull request has merge conflicts that must be resolved before it can be merged. @AlphaINF please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork