feat: Add CI pipeline to validate manifests and helm chart
Added a new GitHub Actions workflow to automatically validate Kubernetes manifests and Helm charts using kube-linter. This ensures that all changes to the deployment configurations adhere to best practices for production readiness and security.
- Scans manifests/base directory.
- Scans charts directory.
- Excludes manifests/overlays to avoid false positives on unpatched.
🎉 Welcome to the Kubeflow Trainer! 🎉
Thanks for opening your first PR! We're happy to have you as part of our community 🚀
Here's what happens next:
- If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
- Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team
Join the community:
- Slack: Join our #kubeflow-trainer Slack channel.
- Meetings: Attend the Kubeflow AutoML and Training Working Group bi-weekly meetings.
Feel free to ask questions in the comments if you need any help or clarification! Thanks again for contributing to Kubeflow! 🙏
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign electronic-waste for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hi @jinchihe/ @kuizhiqing, Please review and approve. Thanks!
Pull Request Test Coverage Report for Build 21117199641
Warning: This coverage report may be inaccurate.
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
- For more information on this, see Tracking coverage changes with pull request builds.
- To avoid this issue with future PRs, see these Recommended CI Configurations.
- For a quick fix, rebase this PR at GitHub. Your next report should be accurate.
Details
- 0 of 0 changed or added relevant lines in 0 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage remained the same at 51.264%
| Totals | |
|---|---|
| Change from base Build 21036734925: | 0.0% |
| Covered Lines: | 1237 |
| Relevant Lines: | 2413 |
💛 - Coveralls
Hi @juyterman1000, thanks for creating this PR.
I think there's a few extra details that need working out to make sure this extra check is going to be robust - I don't suppose you have bandwidth to look into them?
-
We need a way of running the linter locally so we can debug any failing checks. We could maybe update the Makefile with an extra target (or update an existing target) to run the linter, and a target to download the linter to the local bin directory. We can follow the existing pattern we use for golangci-lint. Alternatively, we migth be able to add the check to the pre-commit hooks. Afaik kube-linter doesn't publish a hook, but we could make a local hook that runs a script to download the tool if needed and run it. Could you look into one of those app
-
Can you check which files/kustomizations we should pass to the linter? At the minute you're linting the base/manager manifests but does it make sense to lint any of the overlays too/instead? Does this help remove any of the linting errors (e.g. missing service account)? For the helm chart, does it run using the default values? Does that check enough of the manifests or do we need to specify some custom values to enable bits that aren't enabled by default? Do the kube-linter docs have any guidance for us?
-
Are the checks you're excluding the ones that are currently failing? As I mentioned in #3096 some of these probably ought to be fixed rather than ignored (e.g.
no-read-only-root-fs), and the liveness and readiness port checks should be fixed by #3061. Could you check if any of the failing checks can be fixed and if it makes sense to fix them?
Hi @robert-bell. I’ve opened PR #3119 to fix the no-read-only-root-fs kube-linter finding by enabling a read-only root filesystem for the trainer manager container. Together with PR #3100 (named container ports for liveness/readiness probes), this reduces the kube-linter findings from 8 to 4 when checked locally. The remaining findings relate to image tagging policy, cross-file service account resolution, and resource requests/limits, which I’ve left for discussion rather than guessing values.
Happy to adjust based on your preference.
Hi @robert-bell, I've addressed your first point by adding Makefile targets for local kube-linter:
-
make kube-linter- downloads the binary tobin/-
make lint-manifests- runs linter onmanifests/baseandcharts/kubeflow-trainerFor points 2 and 3, @Goku2099 has opened PRs #3100 (named ports) and #3119 (readOnlyRootFilesystem) which reduce the kube-linter findings from 8 to 5. The remaining exclusions are for cpu/memory requirements, anti-affinity, service account resolution (kustomize false positive), and image tagging - which can be discussed separately.
-
Hi @andreyvelich This has been open for quite some time. When you get a chance, could you please take a look?
/ok-to-test
@juyterman1000 Please also rebase your PR, it should fix the pre-commits.
@Goku2099 @juyterman1000 Please can some of you finalize this PR, so we can move forward?
Sorry @andreyvelich @Goku2099 got occupied due to personal stuff,will push this today.
No worries I started working on the related changes assuming you were offline. Now that you're back, please rebase your PR and we can move forward from there.
Thank you, folks!
@andreyvelich Once this PR is finalized, we can gradually align with the proposed “Code Quality Checks” structure in follow-up changes. Also, if there’s anything I can help with here or in other areas feel free to let me know.
Hi @juyterman1000 just checking in so we can keep this moving. Let me know if you'd like any help with the rebase.
Hi @Goku2099 ,seeing lot of issues on code quality check after rebase,fixing those.