code-intelligence icon indicating copy to clipboard operation
code-intelligence copied to clipboard

label bot workers stop receiving pubsub messages; issue with workload identity?

Open jlewi opened this issue 5 years ago • 3 comments
trafficstars

As part of #70 we deployed the workers on an update cluster which uses workload identity.

I'm observing that after the workers have been up for a long time they appear to stop receiving pubsub notifications.

This is visible in cloud console as a growing backlog of messages.

I suspect an issue related to credentials and workload identity. Bouncing the pods appears to fix it.

Related to #70

jlewi avatar Jan 17 '20 23:01 jlewi

Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.96. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Jan 17 '20 23:01 issue-label-bot[bot]

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/bug 0.96

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

kf-label-bot-dev[bot] avatar Jan 17 '20 23:01 kf-label-bot-dev[bot]

To try to recover

  • Delete the gke metadata servers
kubectl -n kube-system  delete pods -l k8s-app=gke-metadata-server
  • Restart the label bot pods
kubectl delete pods -l app=label-bot 

jlewi avatar Jan 17 '20 23:01 jlewi