buildg icon indicating copy to clipboard operation
buildg copied to clipboard

Nerdctl test makes buildg timeout

Open apostasie opened this issue 10 months ago • 4 comments

See https://github.com/containerd/nerdctl/issues/4046 for details.

It is not clear what is going on - if it is a buildg issue, or a test-in-nerdctl issue.

The problem here is that there is not enought debugging information.

Suggesting that log / debugging is enhanced so that we can figure out what exactly is holding up in https://github.com/ktock/buildg/blob/main/pkg/buildkit/client.go#L239

cc @ktock @AkihiroSuda

apostasie avatar Apr 16 '25 20:04 apostasie

Suggesting that log / debugging is enhanced so that we can figure out what exactly is holding up in https://github.com/ktock/buildg/blob/main/pkg/buildkit/client.go#L239

Sure, I'll work on this. Thanks for reporting this issue.

ktock avatar Apr 17 '25 12:04 ktock

Hey @ktock.

Some data for you.

Here is my current fork with some shit printf:

https://github.com/apostasie/buildg/commit/1b900d70378fea344b30eca9dd4a9edab5bd21cd#diff-3b2fc4f7ad4b1432a01e3bfa582e4736e0273352e8ad4a15aa00e72ab7e9340aR362

On this latest failure here:

https://github.com/containerd/nerdctl/actions/runs/14542345542/job/40802654500?pr=4103#step:8:433

We get to opt, err := runc.NewWorkerOpt(root, snFactory, rootless, oci.ProcessSandbox, nil, nil, nc, nil, "", "", nil, "", "") (and we get into the call in New: func(root string) (ctdsnapshots.Snapshotter, error)

Either we get stuck in the call to return overlay.NewSnapshotter(root, overlay.AsynchronousRemove) or otherwise in runc.NewWorkerOpt.

Of course it is possible that all of this is just slow, and the 3 seconds timeout is too short.

apostasie avatar Apr 18 '25 22:04 apostasie

@ktock

I have tested this repeatedly in https://github.com/containerd/nerdctl/pull/4103

At this point, it feels to me like the timeout is just too short (3*time.Second). If there is something hanging up, I was not able to find it.

Do you think we could just increase the timeout and do a buildg release? Suggesting maybe 20*time.Second? If we still see timeouts happening with that extended delay, then we could re-investigate.

What do you think?

apostasie avatar Apr 20 '25 17:04 apostasie

Yes, SGTM. I'm considering making the timeout configurable.

ktock avatar Apr 21 '25 02:04 ktock