training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

Add CI pipeline to validate manifests and helm chart with kube-linter

Open robert-bell opened this issue 1 month ago • 2 comments

What you would like to be added?

Kube-linter is a static analysis tool for linting kustomize manifests and helm charts, focusing on production readiness and security. It's essentially an equivalent of python's ruff linter, but for k8s manifests.

It'd be a good addition to the CI pipeline PR checks.

Why is this needed?

Kube-linter helps ensure and enforce best practice and security in our k8s manifests.

It's light-weight, configurable, can be run both locally and in CI, and is used by a number of other OSS projects.

It's a good addition for maintaining the health and security of the project without adding a big burden for maintainers or contributors.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

robert-bell avatar Jan 16 '26 15:01 robert-bell

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first issue! We're happy to have you as part of our community 🚀

Here's what happens next:

  • Our team will review your issue soon! cc @kubeflow/kubeflow-trainer-team
  • If you'd like to contribute to this issue, check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.

Join the community:

Feel free to ask questions in the comments if you need any help or clarification! Thanks again for contributing to Kubeflow! 🙏

github-actions[bot] avatar Jan 16 '26 15:01 github-actions[bot]

As a quick sanity check I ran kube-linter 0.8.1 on the current main branch (f921483180490c0159f7c063a0fdeb6d89309441) and it's raised a 8 lint errors:

$ kube-linter lint manifests/overlays/manager
KubeLinter 0.8.1

./kubeflow-trainer/manifests/base/manager/manager.yaml: (object: kubeflow-system/kubeflow-trainer-controller-manager apps/v1, Kind=Deployment) The container "manager" is using an invalid container image, "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest". Please use images that are not blocked by the `BlockList` criteria : [".*:(latest)$" "^[^:]*$" "(.*/[^:]+)$"] (check: latest-tag, remediation: Use a container image with a specific tag other than latest.)                               

./kubeflow-trainer/manifests/base/manager/manager.yaml: (object: kubeflow-system/kubeflow-trainer-controller-manager apps/v1, Kind=Deployment) container "manager" does not expose port 8081 for the HTTPGet (check: liveness-port, remediation: Check which ports you've exposed and ensure they match what you have specified in the liveness probe.)                                                                                                                                                              

./kubeflow-trainer/manifests/base/manager/manager.yaml: (object: kubeflow-system/kubeflow-trainer-controller-manager apps/v1, Kind=Deployment) container "manager" does not have a read-only root file system (check: no-read-only-root-fs, remediation: Set readOnlyRootFilesystem to true in the container securityContext.)

./kubeflow-trainer/manifests/base/manager/manager.yaml: (object: kubeflow-system/kubeflow-trainer-controller-manager apps/v1, Kind=Deployment) container "manager" does not expose port 8081 for the HTTPGet (check: readiness-port, remediation: Check which ports you've exposed and ensure they match what you have specified in the readiness probe.)                                                                                                                                                            

./kubeflow-trainer/manifests/base/manager/manager.yaml: (object: kubeflow-system/kubeflow-trainer-controller-manager apps/v1, Kind=Deployment) container "manager" has cpu request 0 (check: unset-cpu-requirements, remediation: Set CPU requests for your container based on its requirements. Refer to https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits for details.)                                                                                           

./kubeflow-trainer/manifests/base/manager/manager.yaml: (object: kubeflow-system/kubeflow-trainer-controller-manager apps/v1, Kind=Deployment) container "manager" has memory limit 0 (check: unset-memory-requirements, remediation: Set memory limits for your container based on its requirements. Refer to https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits for details.)                                                                                      

./kubeflow-trainer/manifests/overlays/manager/releases/download/v0.10.1/manifests.yaml: (object: kubeflow-system/jobset-controller-manager apps/v1, Kind=Deployment) container "manager" does not expose port 8081 for the HTTPGet (check: liveness-port, remediation: Check which ports you've exposed and ensure they match what you have specified in the liveness probe.)                                                                                                                                        

./kubeflow-trainer/manifests/overlays/manager/releases/download/v0.10.1/manifests.yaml: (object: kubeflow-system/jobset-controller-manager apps/v1, Kind=Deployment) container "manager" does not expose port 8081 for the HTTPGet (check: readiness-port, remediation: Check which ports you've exposed and ensure they match what you have specified in the readiness probe.)                                                                                                                                      

Error: found 8 lint errors

The no-read-only-root-fs, unset-cpu-requirements and unset-memory-requirements are possibly useful, unless we're intentionally not setting these?

I think the latest-tag check is a false-positive, and the port check ones could be ignored or could be fixed for #3061.

robert-bell avatar Jan 16 '26 15:01 robert-bell

That make sense.

/remove-label lifecycle/needs-triage /area engprod

andreyvelich avatar Jan 22 '26 00:01 andreyvelich

/reopen

andreyvelich avatar Jan 22 '26 13:01 andreyvelich

@andreyvelich: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Jan 22 '26 13:01 google-oss-prow[bot]