osm
osm copied to clipboard
Endless envoy-bootstrap-config secrets
Bug description:
Missing clean up of unreferenced envoy-bootstrap-config secrets, so we got 4k of it. All secrets got no OwnerReference.
Affected area (please mark with X where applicable):
- Install [ ]
- SMI Traffic Access Policy [ ]
- SMI Traffic Specs Policy [ ]
- SMI Traffic Split Policy [ ]
- Permissive Traffic Policy [ ]
- Ingress [ ]
- Egress [ ]
- Envoy Control Plane [ ]
- CLI Tool [ ]
- Metrics [ ]
- Certificate Management [ ]
- Sidecar Injection [ x ]
- Logging [ ]
- Debugging [ ]
- Tests [ ]
- Demo [ ]
- CI System [ ]
Expected behavior: No unreferenced secrets. Steps to reproduce the bug (as precisely as possible):
How was OSM installed?: AKS AddOn
Anything else we need to know?:
Bug report archive:
Environment:
- OSM version (use
osm version
): v1.0 - Kubernetes version (use
kubectl version
): 1.23 - Size of cluster (number of worker nodes in the cluster):
- Others:
This seems like a regression as the code should handle the cleanup of secrets via k8s garbage collection through OwnerReferences.
/cc @jaellio
@steeling has also been seeing this issue. He is trying to reproduce it now and will share the logs here.
Would be interesting to have a trace log for the cases at https://github.com/openservicemesh/osm/blob/8fd236e8e104279b4d951a32720e06f4257fd80a/pkg/k8s/announcement_handlers.go#L71 when a conflict error is raised.
Don't know if something else is updating Secret objects in parallel but that may interfere with the handler.
Added default label size/needed
. Please consider re-labeling this issue appropriately.
We're looking to address this in our upcoming release. There should be an error log in the OSM controller when this occurs: https://github.com/openservicemesh/osm/blob/e6304c1/pkg/k8s/announcement_handlers.go#L70
Could anyone running into this error please confirm that you see this log line in the osm-controller?
@schristoff, can you investigate and see if we're still seeing this behavior and if the log line is logged from @keithmattix's message? Thanks!
Sizing it as size/S
because it is just verification/validation at this point. We can/should resize if we uncover this problem still exists and needs a code change.
Heya all, I attempted to reproduce the error above and I am no longer seeing it. I am going to close this issue, and if any of y'all see the issue again please open a new issue with reproduction steps so we can continue to assist.
Thank you!
@keithmattix Similar situation. All bootstrap secrets are missing OwnerRef and are left behind. I can see these logs from https://github.com/openservicemesh/osm/blob/v1.2.1/pkg/k8s/announcement_handlers.go#L57 :
Failed to get secret fabric-benchmark/envoy-bootstrap-config- mounted to Pod fabric-benchmark/apps-sync-job-ccss-poller-benchmark-27800000-9hz2s
A lot of these for different workloads. Looks like it's missing the UUID in the secret name. Not sure if this is a symptom of something else, which might be the root cause for not even being able to reach the OwnerRef code.
@schristoff lorenzo appears to be experiencing this bug