skaffold
skaffold copied to clipboard
Skaffold fails to log and sync
@veermuchandi has been encountering a very odd issue where Skaffold on one of his Cloud Shell instances is unable to report logs from pods, or sync changed files into a container. Digging into it a bit, the Kubernetes pod-watching seems somewhat broken: there are no pod-watching events being reported (you can run skaffold ... -vtrace and you should see a bunch of tracing-level podWatcher messages logged). Oddly, port-forwarding does seem to work, which uses the same pod-watching functionality.
I added some instrumentation to Skaffold to log when a pod-watching event comes in and we found that it is rejected by the podSelector. We further modified this https://github.com/briandealwis/skaffold/commit/62939cee7e7a6318a0552387082cb4f1322ff3ec#diff-dc46afb61b0c02fbc96e826c64ba48587c52475011740d97e8033568d9381a14R141
--- pkg/skaffold/kubernetes/watcher.go
+++ pkg/skaffold/kubernetes/watcher.go
@@ -133,8 +135,10 @@ func (w *podWatcher) Start(ctx context.Context, kubeContext string, namespaces [
if !ok {
continue
}
+ l.Tracef("podWatcher: pod/%s: event=%s", pod.Name, evt)
if !w.podSelector.Select(pod) {
+ l.Tracef("podWatcher: skipping event for pod/%s: not matched by selector: %+v", pod.Name, w.podSelector)
continue
}
and we see a puzzling result where the selector's names is empty and so isn't matching any image.
TRAC[0009] podWatcher: skipping event for pod/mynodejsapp-deployment-6dfc8c95b6-hkzdc: not matched by selector: &{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0} names:map[]} subtask=-1 task=DevLoop
The podSelector here is a kubernetes.ImageList:
https://github.com/GoogleContainerTools/skaffold/blob/1471f8af56a7977ce2621957b96e66598016bc1b/pkg/skaffold/kubernetes/image_list.go#L30-L34
These image names are populated from a callback to AddTagsToPodSelector()
https://github.com/GoogleContainerTools/skaffold/blob/1471f8af56a7977ce2621957b96e66598016bc1b/pkg/skaffold/deploy/util/util.go#L61-L73
which is called by our deployers, for example: https://github.com/GoogleContainerTools/skaffold/blob/1471f8af56a7977ce2621957b96e66598016bc1b/pkg/skaffold/deploy/kubectl/kubectl.go#L161-L164
There must be some kind of race condition happening.
@VeerMuchandi @briandealwis would you be able to attach repro steps for this as well? Have been trying in cloud shell with your nodejsdoesntwork repo and I can't seem to get this happening
Unfortunately it's hit-and-miss. I suspect if you introduce a sleep in TrackBuildArtifacts that you'd see the issue.
I have also encountered this strange behavior in my testing on v2. I will add it to the alpha 2 milestone.