Nerdctl test makes buildg timeout
See https://github.com/containerd/nerdctl/issues/4046 for details.
It is not clear what is going on - if it is a buildg issue, or a test-in-nerdctl issue.
The problem here is that there is not enought debugging information.
Suggesting that log / debugging is enhanced so that we can figure out what exactly is holding up in https://github.com/ktock/buildg/blob/main/pkg/buildkit/client.go#L239
cc @ktock @AkihiroSuda
Suggesting that log / debugging is enhanced so that we can figure out what exactly is holding up in https://github.com/ktock/buildg/blob/main/pkg/buildkit/client.go#L239
Sure, I'll work on this. Thanks for reporting this issue.
Hey @ktock.
Some data for you.
Here is my current fork with some shit printf:
https://github.com/apostasie/buildg/commit/1b900d70378fea344b30eca9dd4a9edab5bd21cd#diff-3b2fc4f7ad4b1432a01e3bfa582e4736e0273352e8ad4a15aa00e72ab7e9340aR362
On this latest failure here:
https://github.com/containerd/nerdctl/actions/runs/14542345542/job/40802654500?pr=4103#step:8:433
We get to opt, err := runc.NewWorkerOpt(root, snFactory, rootless, oci.ProcessSandbox, nil, nil, nc, nil, "", "", nil, "", "") (and we get into the call in New: func(root string) (ctdsnapshots.Snapshotter, error)
Either we get stuck in the call to return overlay.NewSnapshotter(root, overlay.AsynchronousRemove) or otherwise in runc.NewWorkerOpt.
Of course it is possible that all of this is just slow, and the 3 seconds timeout is too short.
@ktock
I have tested this repeatedly in https://github.com/containerd/nerdctl/pull/4103
At this point, it feels to me like the timeout is just too short (3*time.Second).
If there is something hanging up, I was not able to find it.
Do you think we could just increase the timeout and do a buildg release? Suggesting maybe 20*time.Second? If we still see timeouts happening with that extended delay, then we could re-investigate.
What do you think?
Yes, SGTM. I'm considering making the timeout configurable.