pixie
pixie copied to clipboard
Pixie namespace hard down after GKE node pool upgrade
Describe the bug After a GKE node pool upgrade, the vizier pem pods, query broker and metadata pods, and kelvin pod are stuck in the init container phase. This can be fixed by uninstalling and reinstalling px on the cluster in question, but it some extra management overhead.
To Reproduce Steps to reproduce the behavior:
- Create kube cluster in GKE
- Wait for automatic node pool upgrade for your channel
- Watch for pod failure as they restart
Expected behavior I would expect/hope the pods would come back up gracefully on their own after an upgrade
Screenshots
Logs
I already reinstalled pixie on the cluster (px delete
and px deploy
) so this won't have helpful information this time around. I can run logs next time this happens.
App information (please complete the following information):
-
Pixie version: 0.5.12+Distribution.a945b68.20210525150215.1
-
K8s cluster version: 1.18.17-gke.1901
Additional context Add any other context about the problem here.
Yaxiong from the Pixie team:
Did you by any chance collect the description (kubectl describe pod ...
) of the pods, and the logs (kubectl logs ...
) when you noticed this issue?
We've been using gke cluster with node pool scaling, and had not noticed this issue before. Granted that we usually have very small clusters (2 nodes), and we often redeploy the cluster (px delete && skaffold ...) in order to verify the changes we made during development. These situations might cause differences in behaviors.