kubefwd
kubefwd copied to clipboard
Forwarding breaks on deployment update if the new pod is not immediately ready
I'm using skaffold and kubefwd 1.11.1 to run my dev setup and have found two scenarios where the service/pod monitoring feature of kubefwd is not working.
Deleting service+deployment and recreating them (skaffold delete, skaffold run) It seems that kubefwd correctly detects service deletion and creation, but does not account for the fact that it takes some time until the newly created pod becomes ready, so the forwarding never resumes.
Logs:
INFO[15:44:06] Forwarding: test:80 to pod test-7f9f697c8-v7mlw:8080
WARN[15:44:13] Stopped forwarding test in default.
WARN[15:44:21] WARNING: No Running Pods returned for service test in default on cluster .
Updating service+deployment using Recreate strategy (skaffold run) Basically the same thing happens - kubefwd detects the service update and pod termination, but does not wait for the new pod to become ready. This is because the update strategy is configured as Recreate.
Logs:
INFO[16:21:50] Forwarding: test:80 to pod test-67fcc88dbb-6qjc2:8080
INFO[16:21:55] update service default/test.
WARN[16:21:58] test-67fcc88dbb-6qjc2 Pod deleted, restart the test service portforward.
WARN[16:21:58] Stopped forwarding test in default.
WARN[16:21:58] WARNING: No Running Pods returned for service test in default on cluster .
Everything works as expected when using RollingUpdate:
INFO[16:26:59] Forwarding: test:80 to pod test-69766f5bf5-n8whg:8080
INFO[16:27:11] update service default/test.
WARN[16:28:39] test-69766f5bf5-n8whg Pod deleted, restart the test service portforward.
WARN[16:28:39] Stopped forwarding test in default.
INFO[16:28:39] Forwarding: test:80 to pod test-74fc67685-zbptc:8080
Unfortunately in some legacy applications it is not always possible to use RollingUpdate, therefore it would be nice if kubefwd supported waiting for pods to become ready instead of immediately aborting.
I don't know how skaffold works.
Kubefwd support waiting for pods to become ready for service->deployment, but if a service -> deployment created with replicas 0, monitoring for this service will down.
I think Maybe the skaffold delete, skaffold run
will create a replicase 0 deployment->service and increase it, so kubefwd stop monitoring it. But skaffold RollingUpdate use other methods. (This is just my guess.)
I'll check it later(maybe 1 or 2 weeks, i'm in holiday), maybe we can create a fix pr for 0 replicase service->deployment and waiting for the replicase increased.
This issue is not particularly related to skaffold, but here is an explanation of what the commands do:
- skaffold delete - deletes the service, deployment, configmap etc (~ kubectl delete -f)
- skaffold run - creates the service, deployment, configmap etc. If already present, updates the objects (~ kubectl apply -f)
catch here https://github.com/txn2/kubefwd/blob/be4b27f13e6e5be1eeb1d31607815bdbb6759649/cmd/kubefwd/services/services.go#L348-L351
full code is this https://github.com/txn2/kubefwd/blob/be4b27f13e6e5be1eeb1d31607815bdbb6759649/cmd/kubefwd/services/services.go#L318-L351
if kubefwd cant find a pod use service selector, will get this log.
WARN[15:44:21] WARNING: No Running Pods returned for service test in default on cluster .
But don't know why skaffold run is causing this. we can test it later.
I think I'm hitting the same behavior without using skaffold.
- I start kubefwd
- then helm install confluentinc/cp-helm-charts
If I do this in that order I get a few WARNING: No Running Pods returned for service X
for all the services deployed in that chart, but no other errors. No future kubefwd resyncs seems to pick that up either.
If I reverse the order above and start kubefwd after installing that chart, it all works fine.
Kubefwd was never intended to be more than a simple development utility, providing the ability to forward to multiple services with proper hostnames for local microservices development. Early on, some contributors tried to implement watchers, this caused more instability and edge-case race conditions than value. I've left some of that code, but it's not reliable.
If I am developing an application that connects to a few backend services, those services should not go down regularly, and if they do, a simple restart of kubefwd costs me a few seconds. Also, because I know what pods kubefwd is connecting to, I often monitor their logs while developing.