Support different lora adapter artifact registry
🚀 Feature Description and Motivation
apiVersion: model.aibrix.ai/v1alpha1
kind: ModelAdapter
metadata:
name: text2sql-lora-1
namespace: default
spec:
baseModel: llama2-70b
podSelector:
matchLabels:
model.aibrix.ai: llama2-70b
additionalConfig:
# could be model artifact etc.
modelArtifact: yard1/llama-2-7b-sql-lora-test
schedulerName: default-model-adapter-scheduler
Currently, we just use a simple modelArtifact key. this is kind of simple and it only support two "registries"
- local: it has to be absolute path accessible from the container
- huggingface repo id
This is definitely not enough. we need better structure to support more artifact registry and associated authN credentials
Use Case
- Support S3 or other registries
Proposed Solution
No response
There're two options.
- Make the support in engine side. we pass everything into the inference engine
- Runtime should pick it up and download it. we need to change the control flow and invoke the lora registration with absolute path when the lora adapter is downloaded.
This is probably not the most critical task in rc1, we will consider to postpone to next rc release
// ArtifactURL is the address of the model artifact to be downloaded. Different protocol is supported like s3,gcs,huggingface
// +kubebuilder:validation:Required
ArtifactURL string `json:"artifactURL,omitempty"`
// CredentialsSecretRef points to the secret used to authenticate the artifact download requests
// +optional
CredentialsSecretRef *corev1.LocalObjectReference `json:"credentialsSecretRef,omitempty"`
Another challenge is how lora secretRef can be used by an existing pod. the target pod is the container need to consume the secretRef but this can not be completed in runtime.. this would be a blocker
Since it involves the design question, we can not finish this story by RC1. It can be moved to RC2 instead.
Design consideration
- Engine starts with some credentials, then we can load lora, if lora requires credential to be downloaded, it has to be the credential we gave to engine.
- Load_lora can accept the env variables, the engine leverages the credentials to download the model.
- Engine only handle the absolute local path, lora is downloaded by runtime, once scheduling is done, controller triggers the runtime model download, once downloading is complete (how can it knows that in short latency), the controller triggers the model weights loading.
- Engine just ship the task to runtime, runtime takes care of downloading and loading operations.
note:
- We can not rely on pod level credential since that's immutable and there's no way to setup everything upfront.
- Any security concerns to pass through the token or credentials?
/cc @brosoul Since task involves the runtime interaction, please help check it.
I will move this story to RC3, hard to deliver it in RC2
change to v0.2.0 instead
vLLM side
curl -X POST http://localhost:8000/v1/load_lora_adapter \
-H "Content-Type: application/json" \
-d '{"lora_name": "text2sql-lora-1", "lora_path": "bharati2324/Qwen2.5-1.5B-Instruct-Code-LoRA-r16v2"}'
curl -X POST http://localhost:8000/v1/unload_lora_adapter \
-H "Content-Type: application/json" \
-d '{"lora_name": "text2sql-lora-1"}'
model management
curl -X POST http://localhost:8080/v1/lora_adapter/load \
-H "Content-Type: application/json" \
-d '{"lora_name": "text2sql-lora-1", "lora_path": "bharati2324/Qwen2.5-1.5B-Instruct-Code-LoRA-r16v2"}'
curl -X POST http://localhost:8080/v1/lora_adapter/unload \
-H "Content-Type: application/json" \
-d '{"lora_name": "text2sql-lora-1"}'
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "text2sql-lora-2",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}'
notice, lora mode list change has been release in v0.6.2, later release has regression issue.
Testing
- update the controller manager settings
- --enable-runtime-sidecar
- rebuild controller-manager and runtime
This can not be closed even with #580.. We didn't handle the orchestration like model download + model registration. Currently, it's still single step.
absolute path has been supported, then we can mount pvc now. I will postpone the artifact download part to future release