ci: add workflow for k8s-bench evaluations on push/PR
PR Summary
This update introduces a new GitHub workflow (k8s-bench-evals.yaml) designed to facilitate self-service, on-demand execution of Kubernetes benchmark evaluations. It provides manual trigger capabilities, allowing PR owners or developers to test specific evaluation tasks before merging, thereby ensuring robustness and correctness of new or modified evaluations.
Affected Modules
- Workflow configuration (
.github/workflows/k8s-bench-evals.yaml) - Custom actions for setting up Kind clusters (
.github/actions/kind-cluster-setup) - Evaluation scripts (
run-evals.shandanalyze-evals.sh)
Key Details
- Workflow is triggered manually via
workflow_dispatch. - Allows user input for
task_pattern, enabling targeted testing of specific evaluation tasks. - Uses a custom setup action for creating a Kind Kubernetes cluster.
- Runs
run-evals.shwith preconfigured environment variables for consistent testing. - Post-processing includes result analysis and appending the evaluation report to GitHub’s step summary.
- Implements concurrency control based on workflow or branch to prevent overlapping runs.
Potential Impacts
- Empowers developers to validate new evaluation tasks in isolation before merging.
- Enhances testing consistency via controlled environment and fixed parameters.
- Requires users to follow prescribed input patterns to avoid misconfigurations.
- Adds manual step, so it is unsuitable for fully automated testing pipelines.
xref: https://github.com/GoogleCloudPlatform/kubectl-ai/pull/233
@tuannvm Added a periodic job for running the evals periodically or on-demand
PR: https://github.com/GoogleCloudPlatform/kubectl-ai/pull/234
At some point, we can enable pr-submits as well, but not that critical at the moment.
@droot I pivoted my solution a bit. The idea is that I would need to rely on your existing Kind setup, as well as the LLM provider and API key based on Vertex AI. Then, I abstracted that setup into a separate action so that we can reuse it in different pipelines. Finally, I allow users who contribute a new eval to run the get action at their convenience.
I like the refactoring. Makes it reusable.
Finally, I allow users who contribute a new eval to run the get action at their convenience.
Curious about this. Are you able to execute this ? (I suspect you don't have permission to execute it on googlecloudplatform repo, and when running in your fork, the google-auth-action won't be able run successfully because it is conditioned to run on the basis of repo).
@droot You’re right—people outside of Google Cloud Platform wouldn’t be able to run it. What do you think: would it be useful if, next time someone has an evals-related PR, you and Google folks could run it through CI instead of having to run it locally for PR evaluation?
I was also considering having it auto-trigger on every commit, but that seems excessive and would waste CI resources. So, I think it’s better to keep it as a manual dispatch for now.
@droot You’re right—people outside of Google Cloud Platform wouldn’t be able to run it. What do you think: would it be useful if, next time someone has an evals-related PR, you and Google folks could run it through CI instead of having to run it locally for PR evaluation?
Yes, That is reasonable workflow. Thank you.
I was also considering having it auto-trigger on every commit, but that seems excessive and would waste CI resources. So, I think it’s better to keep it as a manual dispatch for now.
Totally agree.