coreos-kubernetes icon indicating copy to clipboard operation
coreos-kubernetes copied to clipboard

calico policy agent can get scheduled multiple times, and not on the controller

Open daveey opened this issue 8 years ago • 4 comments

k --namespace calico-system get pod -o wide

NAME                                            READY     STATUS             RESTARTS   AGE       NODE
calico-policy-agent                             1/2       CrashLoopBackOff   19         1h        ip-10-0-7-216.ec2.internal
calico-policy-agent-ip-10-0-2-97.ec2.internal   2/2       Running            0          1h        ip-10-0-2-97.ec2.internal

It then keeps failing since it can't hit localhost:8080

daveey avatar Jun 21 '16 01:06 daveey

/cc @tomdee

aaronlevy avatar Jun 27 '16 23:06 aaronlevy

@daveey - I'm assuming ip-10-0-7-216.ec2.internal is one of your worker nodes, not a controller?

The policy-controller is meant to run as a static pod from /etc/kubernetes/manifests on the controller nodes only. The failing one doesn't appear to be a static pod (it doesn't have the node suffix on the name).

Is there a reliable repo? It could help me figure out why this is happening.

caseydavenport avatar Sep 01 '16 22:09 caseydavenport

@caseydavenport correct, it's a worker node. there is no reliable repro, but this happened some percentage of time. we've stopped running calico in our configuration, so this hasn't been a problem for us. but it would be nice to resolve.

daveey avatar Sep 01 '16 22:09 daveey

So I rendered, created, destroyed ~ 10 clusters today using kube-aws and was unable to reproduce this scenario. @daveey how frequently were you hitting this? > 1 in 10 times?

Did you make any manual changes to your userdata files? Do you know which non-default options you may have set in cluster.yaml? I'm trying to think of a way this might have happened.

caseydavenport avatar Sep 15 '16 00:09 caseydavenport