volcano icon indicating copy to clipboard operation
volcano copied to clipboard

[Enhancement] Subproject for Supporting AI Agent Workloads

Open kevin-wangzefeng opened this issue 2 months ago • 13 comments

What is the problem you're trying to solve

There has been increasing discussions and interests about volcano's plan to support AI Agent Workloads recently, this issue is to initiate the discussion and track all the community efferots regarding this direction.

Describe the solution you'd like

Abstract

TL;DR

  • Problem: Modern AI is shifting from stateless, single-request inference to long-lived, stateful, session-based Agent workloads that current Kubernetes scheduling patterns do not handle well (startup latency, state persistence, fine-grained scaling, eviction sensitivity).
  • Proposal: Create a Volcano sub-project to add native support for AI Agent workloads (workload CRD, lifecycle manager, agent-aware scheduling policies, and fast startup/hibernate mechanisms).
  • Why it matters: Proper support improves resource utilization, reduces cold-start/state-loss risk, and enables multi-tenant, production-grade Agent deployments on Kubernetes.
  • Scope: Design-focused initial effort—API/workload model, scheduling primitives, lifecycle and warm-pool mechanics; integration and reuse of existing OSS where possible.
  • Call to action: Request feedback, real-world requirements, and collaborators to refine design and drive an agent-box prototype under volcano-sh.

Proposal Details

The full proposal working doc is at: Google Doc: New Subproject to Provide Native Support for AI Agent Workloads in Volcano

Will send PR to the repo when we have a more concrete version.

Additional context

Please feel free the share your thoughts and ideas


Update Oct. 24th, 2025

The repo name agent-box might be a bit too generic and easily overlap with other projects. I'd like to update that we use agentcube instead.

kevin-wangzefeng avatar Oct 21 '25 08:10 kevin-wangzefeng

Good feature, look forward to it!

JesseStutler avatar Oct 21 '25 13:10 JesseStutler

Amazing! we need features to improve model inference beyond model training.

Wonki4 avatar Oct 22 '25 02:10 Wonki4

Whao, this is a neat feature idea. I would happily collaborate on this 👍

hajnalmt avatar Oct 23 '25 12:10 hajnalmt

The repo name agent-box might be a bit too generic and easily overlap with other projects. I'd like to update that we use agentcube instead.

Calling @volcano-sh/volcano-maintainer to vote to approve the creation of agentcube repo.

kevin-wangzefeng avatar Oct 24 '25 09:10 kevin-wangzefeng

Feel free to propose a better repo name option :-)

kevin-wangzefeng avatar Oct 24 '25 09:10 kevin-wangzefeng

Nice feature! /approve

Monokaix avatar Oct 24 '25 10:10 Monokaix

/lgtm /approve

k82cn avatar Oct 24 '25 10:10 k82cn

/approve

hzxuzhonghu avatar Oct 24 '25 10:10 hzxuzhonghu

lgtm, really looking forward to it! /approve

shinytang6 avatar Oct 24 '25 10:10 shinytang6

/approve

william-wang avatar Oct 24 '25 13:10 william-wang

We've got 6/7 supportive feedback from the current maintainers:

  • @kevin-wangzefeng as initiator
  • @Monokaix @k82cn @hzxuzhonghu @shinytang6 @william-wang voted pass.

We are good to move forward to create the repo.

kevin-wangzefeng avatar Oct 25 '25 06:10 kevin-wangzefeng

The agentcube repo is alive at https://github.com/volcano-sh/agentcube. The follow-up steps for the repo setup and development details can be tracked there.

And I'd suggest we keep this issue open for overall proposal updates and discussions

kevin-wangzefeng avatar Oct 25 '25 07:10 kevin-wangzefeng

https://github.com/kubernetes-sigs/lws

this project maybe have some effect for

csh0101 avatar Nov 11 '25 12:11 csh0101