julia icon indicating copy to clipboard operation
julia copied to clipboard

CI: win32 worker exits with non-zero error code shortly after being started

Open d-netto opened this issue 1 year ago • 5 comments

Saw this on https://buildkite.com/julialang/julia-master/builds/35570#018ee359-5c7a-43fc-8309-a24a571e8a38 and https://buildkite.com/julialang/julia-master/builds/35570#018ee395-c4e2-4153-9db3-9fe9822e3bff.

Not sure if it's transient.

d-netto avatar Apr 15 '24 21:04 d-netto

We've had that for months now

giordano avatar Apr 15 '24 21:04 giordano

Any updates on this?

Saw it happening again on https://buildkite.com/julialang/julia-master/builds/37743#019056c0-be81-462a-8e83-bce634b93f28.

d-netto avatar Jun 26 '24 23:06 d-netto

IIRC, @staticfloat and others have spent a lot of time looking into this, and so far we still don't know what the underlying problem is.

In the short-term, the workaround is likely going to be to just manually retry that job when it fails.

DilumAluthge avatar Jun 26 '24 23:06 DilumAluthge

Thanks for the clarification.

d-netto avatar Jun 26 '24 23:06 d-netto

Another workaround that I think would be nice to implement:

If a Windows job fails, and the runtime of the job was <= 60 seconds, automatically retry the job, up to a maximum of N total tries (for a reasonable value of N). However, if a Windows job fails, and the runtime of the job was > 60 seconds, then don't retry the job.

The hard part (the part that I don't know how to implement) is to gate the auto retry on the job duration. Because we don't want to unconditionally retry all failed Windows jobs, just the short ones.

DilumAluthge avatar Jun 27 '24 00:06 DilumAluthge

I don't know where this was written down, but the next step on this issue was to run peflags -v bash.exe on the .exe file in our windows images and see if high-entropy-va is set.

Keno avatar Jul 26 '24 02:07 Keno

Ah, we did look into it. Should have been fixed by https://github.com/JuliaCI/rootfs-images/pull/250.

Keno avatar Jul 26 '24 02:07 Keno

We still have more intermittent windows issues, but let's open new issues for those to segragate failure logs after that change.

Keno avatar Jul 26 '24 02:07 Keno