kubectl-ai icon indicating copy to clipboard operation
kubectl-ai copied to clipboard

ci: add workflow for k8s-bench evaluations on push/PR

Open tuannvm opened this issue 7 months ago • 3 comments

PR Summary

This update introduces a new GitHub workflow (k8s-bench-evals.yaml) designed to facilitate self-service, on-demand execution of Kubernetes benchmark evaluations. It provides manual trigger capabilities, allowing PR owners or developers to test specific evaluation tasks before merging, thereby ensuring robustness and correctness of new or modified evaluations.

Affected Modules

  • Workflow configuration (.github/workflows/k8s-bench-evals.yaml)
  • Custom actions for setting up Kind clusters (.github/actions/kind-cluster-setup)
  • Evaluation scripts (run-evals.sh and analyze-evals.sh)

Key Details

  • Workflow is triggered manually via workflow_dispatch.
  • Allows user input for task_pattern, enabling targeted testing of specific evaluation tasks.
  • Uses a custom setup action for creating a Kind Kubernetes cluster.
  • Runs run-evals.sh with preconfigured environment variables for consistent testing.
  • Post-processing includes result analysis and appending the evaluation report to GitHub’s step summary.
  • Implements concurrency control based on workflow or branch to prevent overlapping runs.

Potential Impacts

  • Empowers developers to validate new evaluation tasks in isolation before merging.
  • Enhances testing consistency via controlled environment and fixed parameters.
  • Requires users to follow prescribed input patterns to avoid misconfigurations.
  • Adds manual step, so it is unsuitable for fully automated testing pipelines.

tuannvm avatar May 14 '25 07:05 tuannvm

xref: https://github.com/GoogleCloudPlatform/kubectl-ai/pull/233

droot avatar May 14 '25 23:05 droot

@tuannvm Added a periodic job for running the evals periodically or on-demand

PR: https://github.com/GoogleCloudPlatform/kubectl-ai/pull/234

At some point, we can enable pr-submits as well, but not that critical at the moment.

droot avatar May 15 '25 18:05 droot

@droot I pivoted my solution a bit. The idea is that I would need to rely on your existing Kind setup, as well as the LLM provider and API key based on Vertex AI. Then, I abstracted that setup into a separate action so that we can reuse it in different pipelines. Finally, I allow users who contribute a new eval to run the get action at their convenience.

tuannvm avatar May 16 '25 01:05 tuannvm

I like the refactoring. Makes it reusable.

Finally, I allow users who contribute a new eval to run the get action at their convenience.

Curious about this. Are you able to execute this ? (I suspect you don't have permission to execute it on googlecloudplatform repo, and when running in your fork, the google-auth-action won't be able run successfully because it is conditioned to run on the basis of repo).

droot avatar May 16 '25 14:05 droot

@droot You’re right—people outside of Google Cloud Platform wouldn’t be able to run it. What do you think: would it be useful if, next time someone has an evals-related PR, you and Google folks could run it through CI instead of having to run it locally for PR evaluation?

tuannvm avatar May 17 '25 05:05 tuannvm

I was also considering having it auto-trigger on every commit, but that seems excessive and would waste CI resources. So, I think it’s better to keep it as a manual dispatch for now.

tuannvm avatar May 17 '25 05:05 tuannvm

@droot You’re right—people outside of Google Cloud Platform wouldn’t be able to run it. What do you think: would it be useful if, next time someone has an evals-related PR, you and Google folks could run it through CI instead of having to run it locally for PR evaluation?

Yes, That is reasonable workflow. Thank you.

droot avatar May 18 '25 15:05 droot

I was also considering having it auto-trigger on every commit, but that seems excessive and would waste CI resources. So, I think it’s better to keep it as a manual dispatch for now.

Totally agree.

droot avatar May 18 '25 15:05 droot