kubectl-ai icon indicating copy to clipboard operation
kubectl-ai copied to clipboard

feat(k8s-bench): add scripts for HPA, rolling update, StatefulSet tasks

Open tuannvm opened this issue 7 months ago • 2 comments

https://github.com/GoogleCloudPlatform/kubectl-ai/issues/145

Overview

This pull request adds three comprehensive Kubernetes benchmark scenarios under k8s-bench/tasks to extend coverage of everyday cluster operations:

  1. rolling-update-deployment
  2. horizontal-pod-autoscaler
  3. statefulset-lifecycle

Each scenario follows the existing pattern with four files:

  • task.yaml – script prompt & metadata
  • setup.sh – resource provisioning & readiness loops
  • verify.sh – assertions on desired state
  • cleanup.sh – namespace/resource teardown

1. rolling-update-deployment

Goal: Zero-downtime image rollout for a Deployment.

  • Setup: Create namespace rollout-test, deploy web-app:nginx:1.21 (3 replicas), wait for readiness
  • Script: kubectl set image to nginx:1.22
  • Verify: kubectl rollout status, ensure all pods use nginx:1.22
  • Cleanup: Delete rollout-test namespace

2. horizontal-pod-autoscaler

Goal: Exercise HPA targeting 50% CPU utilization.

  • Setup:

    • Create namespace hpa-test
    • Deploy a BusyBox CPU-burner with 100m CPU request
    • Create HPA (min=1, max=3, target=50% CPU)
  • Script: Generate sustained CPU load via BusyBox loop

  • Verify: Wait for HPA to scale above 1 replica

  • Cleanup: Delete hpa-test namespace


3. statefulset-lifecycle

Goal: Validate StatefulSet scaling and data persistence.

  • Setup:

    • Create namespace statefulset-test
    • Apply headless Service db
    • Deploy StatefulSet db (5 replicas, 1Gi PVC) writing test data
  • Script:

    • Scale down to 2 replicas, confirm only db-0 & db-1 remain
  • Verify: Pod counts and persistent storage checks

  • Cleanup: Delete statefulset-test namespace


Motivation

  • Covers critical Kubernetes workflows: rolling updates, autoscaling, stateful workloads
  • Enhances benchmark suite for real-world LLM-driven kubectl agents
  • Maintains consistency with existing task structure and conventions

Testing

  • Executed ./k8s-bench run --task-pattern rollout,hpa,statefulset against a Kind cluster
  • Confirmed zero-exit code for success and non-zero for failure conditions
  • Verified no regressions in existing tasks

No breaking changes introduced. All scripts are idempotent and contained within their own namespaces.

tuannvm avatar May 08 '25 06:05 tuannvm

K8s-bench Evaluation Results

Model Performance Summary

Model Success Fail
gpt-4.1 2 1
Total 2 1

Overall Summary

  • Total Runs: 3
  • Overall Success: 2 (66%)
  • Overall Fail: 1 (33%)

Model: gpt-4.1

Task Provider Result
horizontal-pod-autoscaler openai ✅ success
rolling-update-deployment openai ✅ success
statefulset-lifecycle openai

gpt-4.1 Summary

  • Total: 3
  • Success: 2 (66%)
  • Fail: 1 (33%)

tuannvm avatar May 08 '25 06:05 tuannvm

K8s-bench Evaluation Results

Model Performance Summary

Model Success Fail
gpt-4.1 3 0
o4-mini 0 1
Total 3 1

Overall Summary

  • Total Runs: 4
  • Overall Success: 3 (75%)
  • Overall Fail: 1 (25%)

Model: gpt-4.1

Task Provider Result
horizontal-pod-autoscaler openai ✅ success
rolling-update-deployment openai ✅ success
statefulset-lifecycle openai ✅ success

gpt-4.1 Summary

  • Total: 3
  • Success: 3 (100%)
  • Fail: 0 (0%)

Model: o4-mini

Task Provider Result
statefulset-lifecycle openai

o4-mini Summary

  • Total: 1
  • Success: 0 (0%)
  • Fail: 1 (100%)

Report generated on May 8, 2025 at 9:18 PM

tuannvm avatar May 09 '25 04:05 tuannvm