osm icon indicating copy to clipboard operation
osm copied to clipboard

Endless envoy-bootstrap-config secrets

Open thorker opened this issue 2 years ago • 5 comments

Bug description:

Missing clean up of unreferenced envoy-bootstrap-config secrets, so we got 4k of it. All secrets got no OwnerReference.

Affected area (please mark with X where applicable):

  • Install [ ]
  • SMI Traffic Access Policy [ ]
  • SMI Traffic Specs Policy [ ]
  • SMI Traffic Split Policy [ ]
  • Permissive Traffic Policy [ ]
  • Ingress [ ]
  • Egress [ ]
  • Envoy Control Plane [ ]
  • CLI Tool [ ]
  • Metrics [ ]
  • Certificate Management [ ]
  • Sidecar Injection [ x ]
  • Logging [ ]
  • Debugging [ ]
  • Tests [ ]
  • Demo [ ]
  • CI System [ ]

Expected behavior: No unreferenced secrets. Steps to reproduce the bug (as precisely as possible):

How was OSM installed?: AKS AddOn

Anything else we need to know?:

Bug report archive:

Environment:

  • OSM version (use osm version): v1.0
  • Kubernetes version (use kubectl version): 1.23
  • Size of cluster (number of worker nodes in the cluster):
  • Others:

thorker avatar Jun 23 '22 13:06 thorker

This seems like a regression as the code should handle the cleanup of secrets via k8s garbage collection through OwnerReferences.

/cc @jaellio

shashankram avatar Jun 23 '22 16:06 shashankram

@steeling has also been seeing this issue. He is trying to reproduce it now and will share the logs here.

jaellio avatar Jun 23 '22 16:06 jaellio

Would be interesting to have a trace log for the cases at https://github.com/openservicemesh/osm/blob/8fd236e8e104279b4d951a32720e06f4257fd80a/pkg/k8s/announcement_handlers.go#L71 when a conflict error is raised.

Don't know if something else is updating Secret objects in parallel but that may interfere with the handler.

patst avatar Jun 23 '22 18:06 patst

Added default label size/needed. Please consider re-labeling this issue appropriately.

github-actions[bot] avatar Jul 08 '22 00:07 github-actions[bot]

We're looking to address this in our upcoming release. There should be an error log in the OSM controller when this occurs: https://github.com/openservicemesh/osm/blob/e6304c1/pkg/k8s/announcement_handlers.go#L70

Could anyone running into this error please confirm that you see this log line in the osm-controller?

keithmattix avatar Aug 09 '22 17:08 keithmattix

@schristoff, can you investigate and see if we're still seeing this behavior and if the log line is logged from @keithmattix's message? Thanks!

trstringer avatar Oct 24 '22 18:10 trstringer

Sizing it as size/S because it is just verification/validation at this point. We can/should resize if we uncover this problem still exists and needs a code change.

trstringer avatar Oct 24 '22 18:10 trstringer

Heya all, I attempted to reproduce the error above and I am no longer seeing it. I am going to close this issue, and if any of y'all see the issue again please open a new issue with reproduction steps so we can continue to assist.

Thank you!

schristoff avatar Nov 03 '22 01:11 schristoff

@keithmattix Similar situation. All bootstrap secrets are missing OwnerRef and are left behind. I can see these logs from https://github.com/openservicemesh/osm/blob/v1.2.1/pkg/k8s/announcement_handlers.go#L57 :

Failed to get secret fabric-benchmark/envoy-bootstrap-config- mounted to Pod fabric-benchmark/apps-sync-job-ccss-poller-benchmark-27800000-9hz2s

A lot of these for different workloads. Looks like it's missing the UUID in the secret name. Not sure if this is a symptom of something else, which might be the root cause for not even being able to reach the OwnerRef code.

lorenzo-biava avatar Nov 09 '22 13:11 lorenzo-biava

@schristoff lorenzo appears to be experiencing this bug

keithmattix avatar Nov 09 '22 14:11 keithmattix