skaffold icon indicating copy to clipboard operation
skaffold copied to clipboard

Skaffold fails to log and sync

Open briandealwis opened this issue 3 years ago • 3 comments
trafficstars

@veermuchandi has been encountering a very odd issue where Skaffold on one of his Cloud Shell instances is unable to report logs from pods, or sync changed files into a container. Digging into it a bit, the Kubernetes pod-watching seems somewhat broken: there are no pod-watching events being reported (you can run skaffold ... -vtrace and you should see a bunch of tracing-level podWatcher messages logged). Oddly, port-forwarding does seem to work, which uses the same pod-watching functionality.

I added some instrumentation to Skaffold to log when a pod-watching event comes in and we found that it is rejected by the podSelector. We further modified this https://github.com/briandealwis/skaffold/commit/62939cee7e7a6318a0552387082cb4f1322ff3ec#diff-dc46afb61b0c02fbc96e826c64ba48587c52475011740d97e8033568d9381a14R141

--- pkg/skaffold/kubernetes/watcher.go
+++ pkg/skaffold/kubernetes/watcher.go
@@ -133,8 +135,10 @@ func (w *podWatcher) Start(ctx context.Context, kubeContext string, namespaces [
                                        if !ok {
                                                continue
                                        }
+                                       l.Tracef("podWatcher: pod/%s: event=%s", pod.Name, evt)
 
                                        if !w.podSelector.Select(pod) {
+                                               l.Tracef("podWatcher: skipping event for pod/%s: not matched by selector: %+v", pod.Name, w.podSelector)
                                                continue
                                        }

and we see a puzzling result where the selector's names is empty and so isn't matching any image.

TRAC[0009] podWatcher: skipping event for pod/mynodejsapp-deployment-6dfc8c95b6-hkzdc: not matched by selector: &{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0} names:map[]}  subtask=-1 task=DevLoop

The podSelector here is a kubernetes.ImageList: https://github.com/GoogleContainerTools/skaffold/blob/1471f8af56a7977ce2621957b96e66598016bc1b/pkg/skaffold/kubernetes/image_list.go#L30-L34

These image names are populated from a callback to AddTagsToPodSelector() https://github.com/GoogleContainerTools/skaffold/blob/1471f8af56a7977ce2621957b96e66598016bc1b/pkg/skaffold/deploy/util/util.go#L61-L73

which is called by our deployers, for example: https://github.com/GoogleContainerTools/skaffold/blob/1471f8af56a7977ce2621957b96e66598016bc1b/pkg/skaffold/deploy/kubectl/kubectl.go#L161-L164

There must be some kind of race condition happening.

briandealwis avatar Mar 28 '22 16:03 briandealwis

@VeerMuchandi @briandealwis would you be able to attach repro steps for this as well? Have been trying in cloud shell with your nodejsdoesntwork repo and I can't seem to get this happening

MarlonGamez avatar Mar 30 '22 19:03 MarlonGamez

Unfortunately it's hit-and-miss. I suspect if you introduce a sleep in TrackBuildArtifacts that you'd see the issue.

briandealwis avatar Mar 31 '22 14:03 briandealwis

I have also encountered this strange behavior in my testing on v2. I will add it to the alpha 2 milestone.

tejal29 avatar May 02 '22 18:05 tejal29