coreos-kubernetes calico policy agent can get scheduled multiple times, and not on the controller

calico policy agent can get scheduled multiple times, and not on the controller

Open daveey opened this issue 8 years ago • 4 comments

k --namespace calico-system get pod -o wide

NAME                                            READY     STATUS             RESTARTS   AGE       NODE
calico-policy-agent                             1/2       CrashLoopBackOff   19         1h        ip-10-0-7-216.ec2.internal
calico-policy-agent-ip-10-0-2-97.ec2.internal   2/2       Running            0          1h        ip-10-0-2-97.ec2.internal

It then keeps failing since it can't hit localhost:8080

Jun 21 '16 01:06 daveey

/cc @tomdee

Jun 27 '16 23:06 aaronlevy

@daveey - I'm assuming ip-10-0-7-216.ec2.internal is one of your worker nodes, not a controller?

The policy-controller is meant to run as a static pod from /etc/kubernetes/manifests on the controller nodes only. The failing one doesn't appear to be a static pod (it doesn't have the node suffix on the name).

Is there a reliable repo? It could help me figure out why this is happening.

Sep 01 '16 22:09 caseydavenport

@caseydavenport correct, it's a worker node. there is no reliable repro, but this happened some percentage of time. we've stopped running calico in our configuration, so this hasn't been a problem for us. but it would be nice to resolve.

Sep 01 '16 22:09 daveey

So I rendered, created, destroyed ~ 10 clusters today using kube-aws and was unable to reproduce this scenario. @daveey how frequently were you hitting this? > 1 in 10 times?

Did you make any manual changes to your userdata files? Do you know which non-default options you may have set in cluster.yaml? I'm trying to think of a way this might have happened.

Sep 15 '16 00:09 caseydavenport

coreos-kubernetes coreos-kubernetes copied to clipboard

calico policy agent can get scheduled multiple times, and not on the controller

coreos-kubernetes
coreos-kubernetes copied to clipboard