feat(k8s): add a flag to scan actual state of resources
Description
By default, Trivy retrieves Kubernetes resource information from annotations if they exist.
https://github.com/aquasecurity/trivy-kubernetes/blob/ccf11d83e72ae31b91cb6d250e7128a601357650/pkg/trivyk8s/trivyk8s.go#L281-L287
This approach is used because some Kubernetes engines automatically convert API versions from applied YAML configuration files that contain deprecated APIs. It was introduced here #4786 (https://github.com/aquasecurity/trivy-kubernetes/pull/189) based on #4784 More details about reason: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#operational-overview
This PR introduces --use-actual-config flag, which allows scanning the actual state of Kubernetes resources in the cluster instead of relying on values from annotation "kubectl.kubernetes.io/last-applied-configuration".
In cases where it is necessary to scan the real-time state of resources in the cluster, --use-actual-config flag provides an option to bypass the annotations and fetch the current resource configuration directly.
Reproduction steps
pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
namespace: default
spec:
containers:
- name: nginx
image: nginx:1.14.1
kind delete cluster && kind create cluster
kubectl wait --for=condition=Ready --timeout 300s nodes --all
kubectl apply -f pod.yaml
kubectl set image pod/nginx-pod nginx=nginx:1.27.4
kubectl get pod/nginx-pod -o yaml
Before:
$ trivy k8s --include-namespaces default --scanners vuln --report summary
Summary Report for kind-kind
Workload Assessment
┌───────────┬───────────────┬────────────────────────┐
│ Namespace │ Resource │ Vulnerabilities │
│ │ ├────┬─────┬────┬────┬───┤
│ │ │ C │ H │ M │ L │ U │
├───────────┼───────────────┼────┼─────┼────┼────┼───┤
│ default │ Pod/nginx-pod │ 40 │ 104 │ 64 │ 62 │ 9 │
└───────────┴───────────────┴────┴─────┴────┴────┴───┘
After:
$ trivy k8s --include-namespaces default --scanners vuln --report summary --use-actual-config
Summary Report for kind-kind
Workload Assessment
┌───────────┬───────────────┬──────────────────────┐
│ Namespace │ Resource │ Vulnerabilities │
│ │ ├───┬───┬────┬─────┬───┤
│ │ │ C │ H │ M │ L │ U │
├───────────┼───────────────┼───┼───┼────┼─────┼───┤
│ default │ Pod/nginx-pod │ 2 │ 8 │ 43 │ 100 │ │
└───────────┴───────────────┴───┴───┴────┴─────┴───┘
Related issues
- Close #7573
Related PRs
- https://github.com/aquasecurity/trivy-kubernetes/pull/446
Checklist
- [ ] I've read the guidelines for contributing to this repository.
- [ ] I've followed the conventions in the PR title.
- [ ] I've added tests that prove my fix is effective or that my feature works.
- [ ] I've updated the documentation with the relevant information (if needed).
- [ ] I've added usage information (if the PR introduces new options)
- [ ] I've included a "before" and "after" example to the description (if the PR is a user interface change).
as far as I understand using last-applied instead of the live resource was added solely to support deprecated API check. In this case, I think that would be the exception and not the rule. In other words I am suggesting that by default, misconfig scanning will always use the live resource, and only for outdated API we can look at last-applied. this can be activated by the reverse flag, but another idea is to make deprecated API a seperate scannner which might make sense regardless. WDYT?
as far as I understand using last-applied instead of the live resource was added solely to support deprecated API check. In this case, I think that would be the exception and not the rule. In other words I am suggesting that by default, misconfig scanning will always use the live resource, and only for outdated API we can look at last-applied. this can be activated by the reverse flag, but another idea is to make deprecated API a seperate scannner which might make sense regardless. WDYT?
You're right - it was made to support deprecated API. I'll try to create a list of deprecated k8s APIs.
Honestly, it was my first thought. However, I had a concern about the inconsistency between config and k8s scans.
if it doesn't matter, let's go another way )
I'll try to create a list of deprecated k8s APIs
what did you mean by that?
I'll try to create a list of deprecated k8s APIs
what did you mean by that?
If I remember correctly, there's no straightforward way to automatically determine whether an API is deprecated. That's why I was thinking of creating a list of deprecated APIs (similar to how Ubuntu handles EOL dates). I looked into a few projects that try to detect deprecated APIs (ex. Pluto), and they also seem to rely on predefined lists. Am I missing something here?
we have it here: https://github.com/aquasecurity/trivy-db-data/blob/main/k8s/api/k8s-outdated-api.json which is used here: https://github.com/aquasecurity/trivy-checks/blob/83ac3dddea29d16ccd1e84c8c018cffe696f41db/Makefile#L56
we have it here: https://github.com/aquasecurity/trivy-db-data/blob/main/k8s/api/k8s-outdated-api.json which is used here: https://github.com/aquasecurity/trivy-checks/blob/83ac3dddea29d16ccd1e84c8c018cffe696f41db/Makefile#L56
thanks! Nikita pointed to it today,too
I have. another idea, what if we completely remove using last-applied configuration?
All Kubernetes resources supported by Trivy-kubernetes that relied on deprecated APIs were removed no later than version 1.25 wdyt?
do you mean to say that this feature isn't useful anymore? or that our data isn't updated (I know it isn't)?
do you mean to say that this feature isn't useful anymore? or that our data isn't updated (I know it isn't)?
yes, this feature isn't useful anymore the last supported k8s resource (CronJob) with deprecated API ("batch/v1beta1") was removed in k8s v1.25
I agree that this feature isn't very important or useful. I wouldn't mind removing it if it's a pain to maintain, but FTR it still is valid: The concept of deprecating APIs in Kuberentes remains, even if rarely used, and there have been more deprecation since but we stopped updating the feed for some reason.
Again, I think this feature is just nice to have and not core to Trivy's security offering, so if there's a reason to remove it go ahead. Also in the cotext of this issue, since it's causing bugs elsewhere, it's definitely better to prioritize misconfig over it.
@itaysk Sorry, I rushed a bit earlier and didn’t express my thoughts clearly.
Currently, Trivy scans only a limited number of resources (pods, deployments etc). The last one (CronJob) uses a deprecated API that was removed in Kubernetes v1.25.
So, I believe any check for deprecation at this point doesn't add much value and just introduces additional overhead—which could impact performance slightly when scanning large numbers of resources.
That said, I understand that such cases might come up again in the future. and I created this PR to make the check for last-applied-configuration only for deprecated APIs listed in the json: https://github.com/aquasecurity/trivy-kubernetes/pull/458
Using --use-actual-config flag seemed like a trade-off between ensuring deprecated APIs are still detected and minimizing the performance impact during scans.
thanks for explaining, it seems like this feature is causing more problems than value. I have a feeling that's because the implementation wasn't "clean" and that it's possible to refactor it in a ways that is less messy. If it takes too much effort (it's already starting to) we should remove the feature like you suggest. I don't think the --use-actual-config flag is a good direction as it settles the previous design instaed of fixing it. I've added a comment on the PR suggesting maybe another idea
closed in favor #8791