Jiaxin Shan
Jiaxin Shan
@kerthcet v0.3.0 will be rollout no later than mid May. We can leave some tasks to v0.4.0 release. If there're some tasks you feel are necessary to finish before v0.3.0...
in that case, let's assume user deploy 2 controllers. do they need 2 controller (separate deployment) + 1 webhook server? or 2 * (1 controller + 1 webhook server)
@kerthcet Sounds good. do we plan to add further improves into v0.3.0? The proposed cut off plan is next Friday.
In v0.3.0, we already have webhook framework supported, for workload type validation etc, let's move to v0.4.0
@kerthcet Standalone installation by default uses `--disable-webhook` option at this moment, https://github.com/vllm-project/aibrix/blob/402c62c2bb32da951ecaa13a25176f9fbe72c5d7/config/standalone/kv-cache-controller/patch.yaml#L17 We can switch to `enabled` but need to change the manifests and handle some potential naming conflicts. This...
There're two options. 1. Make the support in engine side. we pass everything into the inference engine 2. Runtime should pick it up and download it. we need to change...
``` // ArtifactURL is the address of the model artifact to be downloaded. Different protocol is supported like s3,gcs,huggingface // +kubebuilder:validation:Required ArtifactURL string `json:"artifactURL,omitempty"` // CredentialsSecretRef points to the secret...
Since it involves the design question, we can not finish this story by RC1. It can be moved to RC2 instead.
## Design consideration 1. Engine starts with some credentials, then we can load lora, if lora requires credential to be downloaded, it has to be the credential we gave to...
change to v0.2.0 instead