aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

[Umbrella] Add webhook for validation

Open kerthcet opened this issue 10 months ago • 10 comments

🚀 Feature Description and Motivation

Webhook is used for CRD validations, and will fail fast compared to runtime validation.

Use Case

Once CRD is not right configured, fail fast.

  • [x] webhook framework
  • [ ] add integration tests to CI && separate with E2E tests
  • [ ] ModelAdapter
  • [ ] PodAutoscaler
  • [ ] KVCache
  • [ ] RayclusterFleet
  • [ ] RayclusterReplicaset

Proposed Solution

No response

kerthcet avatar Feb 19 '25 08:02 kerthcet

/assign

kerthcet avatar Feb 19 '25 08:02 kerthcet

We can discuss more details on the webhook usage. In the examples, we just use huggingface models for simplicity. However, in real world, most users has to fetch weights from S3 like object storage.

The challenge at this moment is AIBRix doesn't have any orchestration support to hide those details like llamaz or kubeAI. As a mid term solution. I am thinking whether we can leverage webhook to convert more model configuration from annotations to specs fields like inject sidecar container for model downloading etc. that will fill the gap of missing model orchestration.

Jeffwan avatar Feb 25 '25 00:02 Jeffwan

@kerthcet v0.3.0 will be rollout no later than mid May. We can leave some tasks to v0.4.0 release. If there're some tasks you feel are necessary to finish before v0.3.0 release. Please comment here.

there're one requirement I'd like to discuss with you here. Due to integration complexity, some users prefer the standalone deployment. https://aibrix.readthedocs.io/latest/getting_started/installation/installation.html#install-individual-aibrix-components. In this case, they just want an individual controller. Once we introduce the webhook validation, we probably won't deployment webhook along with each controller. We still like to do some basic validation for those cases. what's your thoughts for this case?

Jeffwan avatar Apr 28 '25 18:04 Jeffwan

Webhook + controller is still a standalone solution, it requires no additional effort, what's their concern here? Or we can use CEL which is build-in the apiserver, but I have no idea whether this meets all of our requirements.

kerthcet avatar Apr 29 '25 02:04 kerthcet

in that case, let's assume user deploy 2 controllers. do they need 2 controller (separate deployment) + 1 webhook server? or 2 * (1 controller + 1 webhook server)

Jeffwan avatar Apr 29 '25 04:04 Jeffwan

Webhook is deployed together with controller, that's say we just need 2 * (controller + webhook).

kerthcet avatar Apr 29 '25 05:04 kerthcet

@kerthcet Sounds good. do we plan to add further improves into v0.3.0? The proposed cut off plan is next Friday.

Jeffwan avatar Apr 30 '25 18:04 Jeffwan

In v0.3.0, we already have webhook framework supported, for workload type validation etc, let's move to v0.4.0

Jeffwan avatar May 08 '25 00:05 Jeffwan

For standalone installation, what's our plan now?

kerthcet avatar May 08 '25 02:05 kerthcet

@kerthcet Standalone installation by default uses --disable-webhook option at this moment, https://github.com/vllm-project/aibrix/blob/402c62c2bb32da951ecaa13a25176f9fbe72c5d7/config/standalone/kv-cache-controller/patch.yaml#L17

We can switch to enabled but need to change the manifests and handle some potential naming conflicts. This is a TODO item

Jeffwan avatar May 08 '25 03:05 Jeffwan