coreos-kubernetes
coreos-kubernetes copied to clipboard
calico policy agent can get scheduled multiple times, and not on the controller
k --namespace calico-system get pod -o wide
NAME READY STATUS RESTARTS AGE NODE
calico-policy-agent 1/2 CrashLoopBackOff 19 1h ip-10-0-7-216.ec2.internal
calico-policy-agent-ip-10-0-2-97.ec2.internal 2/2 Running 0 1h ip-10-0-2-97.ec2.internal
It then keeps failing since it can't hit localhost:8080
/cc @tomdee
@daveey - I'm assuming ip-10-0-7-216.ec2.internal
is one of your worker nodes, not a controller?
The policy-controller is meant to run as a static pod from /etc/kubernetes/manifests on the controller nodes only. The failing one doesn't appear to be a static pod (it doesn't have the node suffix on the name).
Is there a reliable repo? It could help me figure out why this is happening.
@caseydavenport correct, it's a worker node. there is no reliable repro, but this happened some percentage of time. we've stopped running calico in our configuration, so this hasn't been a problem for us. but it would be nice to resolve.
So I rendered, created, destroyed ~ 10 clusters today using kube-aws and was unable to reproduce this scenario. @daveey how frequently were you hitting this? > 1 in 10 times?
Did you make any manual changes to your userdata files? Do you know which non-default options you may have set in cluster.yaml? I'm trying to think of a way this might have happened.