trivy icon indicating copy to clipboard operation
trivy copied to clipboard

feat(k8s): add a flag to scan actual state of resources

Open afdesk opened this issue 1 year ago • 12 comments

Description

By default, Trivy retrieves Kubernetes resource information from annotations if they exist. https://github.com/aquasecurity/trivy-kubernetes/blob/ccf11d83e72ae31b91cb6d250e7128a601357650/pkg/trivyk8s/trivyk8s.go#L281-L287

This approach is used because some Kubernetes engines automatically convert API versions from applied YAML configuration files that contain deprecated APIs. It was introduced here #4786 (https://github.com/aquasecurity/trivy-kubernetes/pull/189) based on #4784 More details about reason: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#operational-overview

This PR introduces --use-actual-config flag, which allows scanning the actual state of Kubernetes resources in the cluster instead of relying on values from annotation "kubectl.kubernetes.io/last-applied-configuration".

In cases where it is necessary to scan the real-time state of resources in the cluster, --use-actual-config flag provides an option to bypass the annotations and fetch the current resource configuration directly.

Reproduction steps

pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx:1.14.1
kind delete cluster && kind create cluster
kubectl wait --for=condition=Ready --timeout 300s nodes --all

kubectl apply -f pod.yaml
kubectl set image pod/nginx-pod nginx=nginx:1.27.4
kubectl get pod/nginx-pod -o yaml

Before:

$ trivy k8s --include-namespaces default --scanners vuln --report summary

Summary Report for kind-kind

Workload Assessment
┌───────────┬───────────────┬────────────────────────┐
│ Namespace │   Resource    │    Vulnerabilities     │
│           │               ├────┬─────┬────┬────┬───┤
│           │               │ C  │  H  │ M  │ L  │ U │
├───────────┼───────────────┼────┼─────┼────┼────┼───┤
│ default   │ Pod/nginx-pod │ 40 │ 104 │ 64 │ 62 │ 9 │
└───────────┴───────────────┴────┴─────┴────┴────┴───┘

After:

$ trivy  k8s --include-namespaces default --scanners vuln --report summary --use-actual-config

Summary Report for kind-kind

Workload Assessment
┌───────────┬───────────────┬──────────────────────┐
│ Namespace │   Resource    │   Vulnerabilities    │
│           │               ├───┬───┬────┬─────┬───┤
│           │               │ C │ H │ M  │  L  │ U │
├───────────┼───────────────┼───┼───┼────┼─────┼───┤
│ default   │ Pod/nginx-pod │ 2 │ 8 │ 43 │ 100 │   │
└───────────┴───────────────┴───┴───┴────┴─────┴───┘

Related issues

  • Close #7573

Related PRs

  • https://github.com/aquasecurity/trivy-kubernetes/pull/446

Checklist

  • [ ] I've read the guidelines for contributing to this repository.
  • [ ] I've followed the conventions in the PR title.
  • [ ] I've added tests that prove my fix is effective or that my feature works.
  • [ ] I've updated the documentation with the relevant information (if needed).
  • [ ] I've added usage information (if the PR introduces new options)
  • [ ] I've included a "before" and "after" example to the description (if the PR is a user interface change).

afdesk avatar Feb 27 '25 09:02 afdesk

as far as I understand using last-applied instead of the live resource was added solely to support deprecated API check. In this case, I think that would be the exception and not the rule. In other words I am suggesting that by default, misconfig scanning will always use the live resource, and only for outdated API we can look at last-applied. this can be activated by the reverse flag, but another idea is to make deprecated API a seperate scannner which might make sense regardless. WDYT?

itaysk avatar Apr 07 '25 09:04 itaysk

as far as I understand using last-applied instead of the live resource was added solely to support deprecated API check. In this case, I think that would be the exception and not the rule. In other words I am suggesting that by default, misconfig scanning will always use the live resource, and only for outdated API we can look at last-applied. this can be activated by the reverse flag, but another idea is to make deprecated API a seperate scannner which might make sense regardless. WDYT?

You're right - it was made to support deprecated API. I'll try to create a list of deprecated k8s APIs.

Honestly, it was my first thought. However, I had a concern about the inconsistency between config and k8s scans. if it doesn't matter, let's go another way )

afdesk avatar Apr 07 '25 11:04 afdesk

I'll try to create a list of deprecated k8s APIs

what did you mean by that?

itaysk avatar Apr 07 '25 12:04 itaysk

I'll try to create a list of deprecated k8s APIs

what did you mean by that?

If I remember correctly, there's no straightforward way to automatically determine whether an API is deprecated. That's why I was thinking of creating a list of deprecated APIs (similar to how Ubuntu handles EOL dates). I looked into a few projects that try to detect deprecated APIs (ex. Pluto), and they also seem to rely on predefined lists. Am I missing something here?

afdesk avatar Apr 07 '25 13:04 afdesk

we have it here: https://github.com/aquasecurity/trivy-db-data/blob/main/k8s/api/k8s-outdated-api.json which is used here: https://github.com/aquasecurity/trivy-checks/blob/83ac3dddea29d16ccd1e84c8c018cffe696f41db/Makefile#L56

itaysk avatar Apr 08 '25 18:04 itaysk

we have it here: https://github.com/aquasecurity/trivy-db-data/blob/main/k8s/api/k8s-outdated-api.json which is used here: https://github.com/aquasecurity/trivy-checks/blob/83ac3dddea29d16ccd1e84c8c018cffe696f41db/Makefile#L56

thanks! Nikita pointed to it today,too

I have. another idea, what if we completely remove using last-applied configuration?

All Kubernetes resources supported by Trivy-kubernetes that relied on deprecated APIs were removed no later than version 1.25 wdyt?

afdesk avatar Apr 08 '25 18:04 afdesk

do you mean to say that this feature isn't useful anymore? or that our data isn't updated (I know it isn't)?

itaysk avatar Apr 08 '25 18:04 itaysk

do you mean to say that this feature isn't useful anymore? or that our data isn't updated (I know it isn't)?

yes, this feature isn't useful anymore the last supported k8s resource (CronJob) with deprecated API ("batch/v1beta1") was removed in k8s v1.25

afdesk avatar Apr 08 '25 18:04 afdesk

I agree that this feature isn't very important or useful. I wouldn't mind removing it if it's a pain to maintain, but FTR it still is valid: The concept of deprecating APIs in Kuberentes remains, even if rarely used, and there have been more deprecation since but we stopped updating the feed for some reason.

Again, I think this feature is just nice to have and not core to Trivy's security offering, so if there's a reason to remove it go ahead. Also in the cotext of this issue, since it's causing bugs elsewhere, it's definitely better to prioritize misconfig over it.

itaysk avatar Apr 09 '25 07:04 itaysk

@itaysk Sorry, I rushed a bit earlier and didn’t express my thoughts clearly.

Currently, Trivy scans only a limited number of resources (pods, deployments etc). The last one (CronJob) uses a deprecated API that was removed in Kubernetes v1.25.

So, I believe any check for deprecation at this point doesn't add much value and just introduces additional overhead—which could impact performance slightly when scanning large numbers of resources.

That said, I understand that such cases might come up again in the future. and I created this PR to make the check for last-applied-configuration only for deprecated APIs listed in the json: https://github.com/aquasecurity/trivy-kubernetes/pull/458

afdesk avatar Apr 11 '25 12:04 afdesk

Using --use-actual-config flag seemed like a trade-off between ensuring deprecated APIs are still detected and minimizing the performance impact during scans.

afdesk avatar Apr 11 '25 12:04 afdesk

thanks for explaining, it seems like this feature is causing more problems than value. I have a feeling that's because the implementation wasn't "clean" and that it's possible to refactor it in a ways that is less messy. If it takes too much effort (it's already starting to) we should remove the feature like you suggest. I don't think the --use-actual-config flag is a good direction as it settles the previous design instaed of fixing it. I've added a comment on the PR suggesting maybe another idea

itaysk avatar Apr 12 '25 13:04 itaysk

closed in favor #8791

afdesk avatar Apr 29 '25 12:04 afdesk