kubectl-ai ci: add workflow for k8s-bench evaluations on push/PR

PR Summary

This update introduces a new GitHub workflow (k8s-bench-evals.yaml) designed to facilitate self-service, on-demand execution of Kubernetes benchmark evaluations. It provides manual trigger capabilities, allowing PR owners or developers to test specific evaluation tasks before merging, thereby ensuring robustness and correctness of new or modified evaluations.

Affected Modules

Workflow configuration (.github/workflows/k8s-bench-evals.yaml)
Custom actions for setting up Kind clusters (.github/actions/kind-cluster-setup)
Evaluation scripts (run-evals.sh and analyze-evals.sh)

Key Details

Workflow is triggered manually via workflow_dispatch.
Allows user input for task_pattern, enabling targeted testing of specific evaluation tasks.
Uses a custom setup action for creating a Kind Kubernetes cluster.
Runs run-evals.sh with preconfigured environment variables for consistent testing.
Post-processing includes result analysis and appending the evaluation report to GitHub’s step summary.
Implements concurrency control based on workflow or branch to prevent overlapping runs.

Potential Impacts

Empowers developers to validate new evaluation tasks in isolation before merging.
Enhances testing consistency via controlled environment and fixed parameters.
Requires users to follow prescribed input patterns to avoid misconfigurations.
Adds manual step, so it is unsuitable for fully automated testing pipelines.

May 14 '25 07:05 tuannvm

xref: https://github.com/GoogleCloudPlatform/kubectl-ai/pull/233

May 14 '25 23:05 droot

@tuannvm Added a periodic job for running the evals periodically or on-demand

PR: https://github.com/GoogleCloudPlatform/kubectl-ai/pull/234

At some point, we can enable pr-submits as well, but not that critical at the moment.

May 15 '25 18:05 droot

@droot I pivoted my solution a bit. The idea is that I would need to rely on your existing Kind setup, as well as the LLM provider and API key based on Vertex AI. Then, I abstracted that setup into a separate action so that we can reuse it in different pipelines. Finally, I allow users who contribute a new eval to run the get action at their convenience.

May 16 '25 01:05 tuannvm

I like the refactoring. Makes it reusable.

Finally, I allow users who contribute a new eval to run the get action at their convenience.

Curious about this. Are you able to execute this ? (I suspect you don't have permission to execute it on googlecloudplatform repo, and when running in your fork, the google-auth-action won't be able run successfully because it is conditioned to run on the basis of repo).

May 16 '25 14:05 droot

@droot You’re right—people outside of Google Cloud Platform wouldn’t be able to run it. What do you think: would it be useful if, next time someone has an evals-related PR, you and Google folks could run it through CI instead of having to run it locally for PR evaluation?

May 17 '25 05:05 tuannvm

I was also considering having it auto-trigger on every commit, but that seems excessive and would waste CI resources. So, I think it’s better to keep it as a manual dispatch for now.

May 17 '25 05:05 tuannvm

@droot You’re right—people outside of Google Cloud Platform wouldn’t be able to run it. What do you think: would it be useful if, next time someone has an evals-related PR, you and Google folks could run it through CI instead of having to run it locally for PR evaluation?

Yes, That is reasonable workflow. Thank you.

May 18 '25 15:05 droot

I was also considering having it auto-trigger on every commit, but that seems excessive and would waste CI resources. So, I think it’s better to keep it as a manual dispatch for now.

Totally agree.

May 18 '25 15:05 droot