Trivy operator behaviour when containers are injected into deployment pod by admissions controllers
Hello, When we run a deployment in k8s trivy-operator looks into the replica set get the images for vulnerability scanning. The problem I see here is in case it's not covers cases when mutating admission controller injects container into the pod. Good example here is a case when we inject service mesh sidecars into our application. Usually then 1 or more containers being added into the pod however replica set normally remains unchanged and contains only the application image in the definition. As result in reality we have pods with application containers + containers injected by admission controllers and as trivy look on replicaset only the application container will be scanned.
Here is the image grep for the replica set:
As you can see we have only 1 image
And here we grep image from the pods controlled by this replica set
As u can see we have 2 additional images there consul-dataplane:1.2.3 is consul service mesh sidecar container image and consul-k8s-control-plane:1.2.3 is an init container image both these images are injected into pod by admission controller and they are not scanned by trivy-operator.
Any suggestions, opinions ?
Thank you.
@andriktr that is a very good point , there is a way to know if the service mesh has added a sidecars , I mean like labels or annotations ?
I believe this depends on mesh solution, but typically u enable service mesh by adding an annotation to deployment/pod.
apiVersion: apps/v1
kind: Deployment
metadata:
name: static-client
namespace: consul-test
spec:
replicas: 1
selector:
matchLabels:
app: static-client
template:
metadata:
name: static-client
labels:
app: static-client
annotations:
'consul.hashicorp.com/connect-inject': 'true'
....
However it also possible to enable injection by default for all workloads in cluster and I'm not sure if annotations will be added in this case as well, but most probably yes.
I'm looking for an indicator which will help to decide when to scan pod and when replicaset
Probably it will be hard to have one indicator for all possible injection cases. Maybe it possible to do some comparison between replicaset and final pod and if container count in replicaset is equal to pods container count then scan replicaset else scan pod.
comparing image names between replicaset and final pods seem to be the only way to avoid troubles with injected sidecars by service meshes or mutation hooks.
@andriktr @alfsch I have created a fix #1917 which will still scan the app container when sidecar if failing. its not perfect but at least you'll get vulnerability report for main container.
let me know if it sufficient
Hmm... this issue is not about a failing containers, but more about that sidecar container is not scanned when it injected by admission controller(-s) does your fix covers this as well ?
Hmm... this issue is not about a failing containers, but more about that sidecar container is not scanned when it injected by admission controller(-s) does your fix covers this as well ?
no, unfortunately, I thought it might help as if a pod has more than one container and one is failing while other passing then until now you'll not get any report for either of containers
@andriktr the above mentioned annotations and labels aren't added in all cases. If you uses kyverno image mutations, it's inside a kyverno rule resource and no annotation which can give a hint. In case of istio service mesh it's possible to do it globally without any annotation or label, with label on namespace and with label on workloads pod-template.
@andriktr the above mentioned annotations and labels aren't added in all cases. If you uses kyverno image mutations, it's inside a kyverno rule resource and no annotation which can give a hint.
In case of istio service mesh it's possible to do it globally without any annotation or label, with label on namespace and with label on workloads pod-template.
This actually obvious and depends on solution. Actually most effective way would be to just query for unique running images in a namespace and scan them and not relay on the replica set definition att all
@andriktr the above mentioned annotations and labels aren't added in all cases. If you uses kyverno image mutations, it's inside a kyverno rule resource and no annotation which can give a hint. In case of istio service mesh it's possible to do it globally without any annotation or label, with label on namespace and with label on workloads pod-template.
This actually obvious and depends on solution. Actually most effective way would be to just query for unique running images in a namespace and scan them and not relay on the replica set definition att all
trivy-operator do not perform query it follow operator pattern (event base) for every (new, update, deletion) of resource
@andriktr the above mentioned annotations and labels aren't added in all cases. If you uses kyverno image mutations, it's inside a kyverno rule resource and no annotation which can give a hint.
In case of istio service mesh it's possible to do it globally without any annotation or label, with label on namespace and with label on workloads pod-template.
This actually obvious and depends on solution. Actually most effective way would be to just query for unique running images in a namespace and scan them and not relay on the replica set definition att all
trivy-operator do not perform query it follow operator pattern (event base) for every (new, update, deletion) of resource
When either replicaset and final pod should be compared or images which should be scanned should be taken from the pod only and if there are more than one pod with same image report should sort it out to avoid duplicated info.
Alternatively operator behaviour could be changed to track running images in a namespace instead of running replicasets.
@andriktr https://github.com/aquasecurity/trivy-operator/issues/1872#issuecomment-2004142155 describes some case which also have to be handled. The truth is only in the pods running in a namespace or in the relation between the higher level workload descriptions like deployment/replicasets/.... and their pods.
I have a couple thoughts on how this could potentially be handled. Like @andriktr, I am also missing scans on some images which only get added to pods via mutations.
The brute force approach would be to watch all pods (I think this is already happening) and getting the set of images in them, and then comparing that set to the controller (the replicaset, for example). If there are additional images in the pod's set, then add these into the scan.
A slightly more elegant (but still incomplete) approach would be to leverage Kubernetes v1.28's support for sidecar containers. The mechanism here is to run sidecar containers as init containers which set restartPolicy to Always. This capability is protected by a feature flag and is enabled by default in v1.29. However, many clusters won't have this capability, and even when they do, many mutating webhooks will not use it. This may be a good approach a year from now, but not for today.
A slightly more elegant (but still incomplete) approach would be to leverage Kubernetes v1.28's support for sidecar containers. The mechanism here is to run sidecar containers as init containers which set
restartPolicytoAlways. This capability is protected by a feature flag and is enabled by default in v1.29. However, many clusters won't have this capability, and even when they do, many mutating webhooks will not use it. This may be a good approach a year from now, but not for today.
@jutley thanks for this input. I'll have a look to see how it fir our operator
@chen-keinan any news?
@chen-keinan are there any news on this issue?
@ka14be @seekermarcel unfortunately, Chen left the team and the project. we'll try to prioritize this task, but there’s no exact timeline at the moment.