training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

feat: Add CI pipeline to validate manifests and helm chart

Open juyterman1000 opened this issue 2 months ago • 3 comments

Added a new GitHub Actions workflow to automatically validate Kubernetes manifests and Helm charts using kube-linter. This ensures that all changes to the deployment configurations adhere to best practices for production readiness and security.

  • Scans manifests/base directory.
  • Scans charts directory.
  • Excludes manifests/overlays to avoid false positives on unpatched.

juyterman1000 avatar Jan 18 '26 05:01 juyterman1000

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

  • If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
  • Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team

Join the community:

Feel free to ask questions in the comments if you need any help or clarification! Thanks again for contributing to Kubeflow! 🙏

github-actions[bot] avatar Jan 18 '26 05:01 github-actions[bot]

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign electronic-waste for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Jan 18 '26 05:01 google-oss-prow[bot]

Hi @jinchihe/ @kuizhiqing, Please review and approve. Thanks!

juyterman1000 avatar Jan 18 '26 05:01 juyterman1000

Pull Request Test Coverage Report for Build 21117199641

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 51.264%

Totals Coverage Status
Change from base Build 21036734925: 0.0%
Covered Lines: 1237
Relevant Lines: 2413

💛 - Coveralls

coveralls avatar Jan 19 '26 17:01 coveralls

Hi @juyterman1000, thanks for creating this PR.

I think there's a few extra details that need working out to make sure this extra check is going to be robust - I don't suppose you have bandwidth to look into them?

  1. We need a way of running the linter locally so we can debug any failing checks. We could maybe update the Makefile with an extra target (or update an existing target) to run the linter, and a target to download the linter to the local bin directory. We can follow the existing pattern we use for golangci-lint. Alternatively, we migth be able to add the check to the pre-commit hooks. Afaik kube-linter doesn't publish a hook, but we could make a local hook that runs a script to download the tool if needed and run it. Could you look into one of those app

  2. Can you check which files/kustomizations we should pass to the linter? At the minute you're linting the base/manager manifests but does it make sense to lint any of the overlays too/instead? Does this help remove any of the linting errors (e.g. missing service account)? For the helm chart, does it run using the default values? Does that check enough of the manifests or do we need to specify some custom values to enable bits that aren't enabled by default? Do the kube-linter docs have any guidance for us?

  3. Are the checks you're excluding the ones that are currently failing? As I mentioned in #3096 some of these probably ought to be fixed rather than ignored (e.g. no-read-only-root-fs), and the liveness and readiness port checks should be fixed by #3061. Could you check if any of the failing checks can be fixed and if it makes sense to fix them?

robert-bell avatar Jan 21 '26 09:01 robert-bell

Hi @robert-bell. I’ve opened PR #3119 to fix the no-read-only-root-fs kube-linter finding by enabling a read-only root filesystem for the trainer manager container. Together with PR #3100 (named container ports for liveness/readiness probes), this reduces the kube-linter findings from 8 to 4 when checked locally. The remaining findings relate to image tagging policy, cross-file service account resolution, and resource requests/limits, which I’ve left for discussion rather than guessing values.

Happy to adjust based on your preference.

Goku2099 avatar Jan 22 '26 09:01 Goku2099

Hi @robert-bell, I've addressed your first point by adding Makefile targets for local kube-linter:

  • make kube-linter - downloads the binary to bin/
    • make lint-manifests - runs linter on manifests/base and charts/kubeflow-trainer For points 2 and 3, @Goku2099 has opened PRs #3100 (named ports) and #3119 (readOnlyRootFilesystem) which reduce the kube-linter findings from 8 to 5. The remaining exclusions are for cpu/memory requirements, anti-affinity, service account resolution (kustomize false positive), and image tagging - which can be discussed separately.

juyterman1000 avatar Jan 23 '26 04:01 juyterman1000

Hi @andreyvelich This has been open for quite some time. When you get a chance, could you please take a look?

juyterman1000 avatar Jan 25 '26 00:01 juyterman1000

/ok-to-test

andreyvelich avatar Feb 05 '26 20:02 andreyvelich

@juyterman1000 Please also rebase your PR, it should fix the pre-commits.

andreyvelich avatar Feb 08 '26 22:02 andreyvelich

@Goku2099 @juyterman1000 Please can some of you finalize this PR, so we can move forward?

andreyvelich avatar Feb 11 '26 14:02 andreyvelich

Sorry @andreyvelich @Goku2099 got occupied due to personal stuff,will push this today.

juyterman1000 avatar Feb 11 '26 15:02 juyterman1000

No worries I started working on the related changes assuming you were offline. Now that you're back, please rebase your PR and we can move forward from there.

Goku2099 avatar Feb 11 '26 15:02 Goku2099

Thank you, folks!

andreyvelich avatar Feb 11 '26 15:02 andreyvelich

@andreyvelich Once this PR is finalized, we can gradually align with the proposed “Code Quality Checks” structure in follow-up changes. Also, if there’s anything I can help with here or in other areas feel free to let me know.

Goku2099 avatar Feb 11 '26 15:02 Goku2099

Hi @juyterman1000 just checking in so we can keep this moving. Let me know if you'd like any help with the rebase.

Goku2099 avatar Feb 13 '26 15:02 Goku2099

Hi @Goku2099 ,seeing lot of issues on code quality check after rebase,fixing those.

juyterman1000 avatar Feb 13 '26 15:02 juyterman1000