k8s-ec2-srcdst icon indicating copy to clipboard operation
k8s-ec2-srcdst copied to clipboard

Panic observed when a node gets deleted

Open ottoyiu opened this issue 6 years ago • 9 comments

A panic occurs when a node gets deleted and returns a cache.DeletedFinalStateUnknown instead of a Node.

I0305 12:49:48.849075       1 main.go:42] k8s-ec2-srcdst: v0.2.1
E0305 12:56:57.201434       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: con$
ection refused
E0305 12:56:58.202361       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: con$
ection refused
E0305 12:57:29.203208       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
E0305 12:58:00.204087       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
E0305 12:58:31.205268       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
I0305 12:58:32.427858       1 srcdst_controller.go:96] Marking node ip-10-63-163-245.us-west-2.compute.internal with SrcDstCheckDisabledAnnotation
E0305 12:58:32.448368       1 runtime.go:66] Observed a panic: &runtime.TypeAssertionError{interfaceString:"interface {}", concreteString:"cache.DeletedFinalStateUnknown", assertedString:"*v1.Node", missingMethod:""} (interface conversio$
: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node)
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/asm_amd64.s:509
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/panic.go:491
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/iface.go:172
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/pkg/controller/srcdst_controller.go:64
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/pkg/controller/srcdst_controller.go:51
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:209
<autogenerated>:1
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:320
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:451
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:150
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:124
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:124
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/proc.go:185
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/asm_amd64.s:2337

a rewrite is in-order with the new style of writing these Controllers in client-go...

Related to: https://github.com/kubernetes/kops/issues/4466

ottoyiu avatar Mar 07 '18 18:03 ottoyiu

I'm seeing this quite frequently also, is there a good readinessProbe we can define for now?

blakebarnett avatar Mar 28 '18 22:03 blakebarnett

@blakebarnett I'm still trying to find some time out of my schedule to rewrite this controller, which will include health checks and metrics (ie. sync duration).

For now, I created a quick patch that should solve the immediate problem with these panics: https://github.com/ottoyiu/k8s-ec2-srcdst/pull/11/

I'm not going to be able to test this until early next week; can you give this patch a try and see if this will alleviate your issues?

ottoyiu avatar Mar 28 '18 23:03 ottoyiu

@blakebarnett Forgot to link the built image:

The docker image is, for that branch is: ottoyiu/k8s-ec2-srcdst:cast-panic-patch

https://hub.docker.com/r/ottoyiu/k8s-ec2-srcdst/tags/

ottoyiu avatar Mar 28 '18 23:03 ottoyiu

I'll do some testing with it, thanks!

blakebarnett avatar Mar 28 '18 23:03 blakebarnett

Have you approached the calico team about adding this to calico-kube-controllers? It would solve the problem of not being able to do a conditional deploy on upgrades via kops also...

blakebarnett avatar Mar 28 '18 23:03 blakebarnett

My testing looks good btw...

blakebarnett avatar Mar 29 '18 00:03 blakebarnett

@blakebarnett I merged the patch; going to roll a release.

I have not approached the calico team about this but it sounds like a good idea perhaps to put this logic in kube-controllers if they're ok with the idea of having cloud specific implementation details.

ottoyiu avatar Apr 03 '18 17:04 ottoyiu

great timing! I'm upgrading our prod cluster tonight.

blakebarnett avatar Apr 03 '18 18:04 blakebarnett

guess this needs to be upstreamed - https://sourcegraph.com/github.com/kubernetes/[email protected]/-/blob/upup/models/cloudup/resources/addons/networking.projectcalico.org/k8s-1.7.yaml.template#L515:47

I'll open a PR if nobody else (but it's going to take me few days to get around to that)

so0k avatar May 09 '18 08:05 so0k