lima icon indicating copy to clipboard operation
lima copied to clipboard

Windows CI began to fail on Oct 21

Open AkihiroSuda opened this issue 1 year ago • 3 comments

#2769 passed the CI, but its merge commit and later ones are failing

https://github.com/lima-vm/lima/actions/runs/11429806278/job/31800191430

[…]
time="2024-10-21T01:14:55Z" level=info msg="SSH Local Port: 22"
time="2024-10-21T01:14:55Z" level=info msg="[hostagent] Waiting for the essential requirement 1 of 2: \"ssh\""
time="2024-10-21T01:15:05Z" level=info msg="[hostagent] Waiting for the essential requirement 1 of 2: \"ssh\""
time="2024-10-21T01:15:15Z" level=info msg="[hostagent] Waiting for the essential requirement 1 of 2: \"ssh\""
time="2024-10-21T01:24:43Z" level=fatal msg="did not receive an event with the \"running\" status"

Something seems to have changed between https://github.com/actions/runner-images/releases/tag/win22%2F20241006.1 and https://github.com/actions/runner-images/releases/tag/win22%2F20241015.1

AkihiroSuda avatar Oct 21 '24 10:10 AkihiroSuda

In https://github.com/lima-vm/lima/actions/runs/11445753347/job/31843450765?pr=2778 I see:

System has not been booted with systemd as init system (PID 1). Can't operate.

jandubois avatar Oct 21 '24 19:10 jandubois

@pendo324 Do you have any idea what may be causing the Windows tests to fail now?

I can't find anything that seems related in https://github.com/actions/runner-images/commit/fcc4cdb1d095af1317859c4809364538953b3497 or https://github.com/actions/runner-images/commit/09ff567de6908096a96ace47eb3f41079993366d

The errors look like systemd is no longer enabled in your distro, but there has been no change to the distro.

I'm at a loss on what might be causing this.

jandubois avatar Oct 22 '24 16:10 jandubois

Thanks for pinging me, taking a look now

pendo324 avatar Oct 22 '24 16:10 pendo324

I'm doing some experiments with Windows support. I managed to replicate this CI attempt in my rebuild workflow. It worked successfully on a default GH runner. Logs are available https://github.com/arixmkii/qcw/actions/runs/13090314041/job/36526224725

The biggest difference in the setup is that I have to use latest preview WSL build from https://github.com/microsoft/WSL/releases

arixmkii avatar Feb 01 '25 16:02 arixmkii

@jandubois I debugged this. There is actually related change in commits you showed. It is Git version bump. It uses OpenSSH from Git distribution.

The script "user session is ready for ssh" hangs indefinitely on Git 2.47 and newer releases. The same is the case for latest msys2 OpenSSH. I downgraded the Git on my system and managed to run WSL2 machine.

I also managed to run almost all integration tests with this WSL2 machine, when using OpenSSH inside Alpine companion distro in WSL2 (not using any of Windows tools) https://github.com/arixmkii/qcw/actions/runs/13474629971/job/37652601743

Conclusion. Machine didn't break, Windows tooling has some sort of issue/regression, which might or might not be fixed.

I tried to create an isolated reproducer using same script doing cat script sh | <openssh command from lima> in parallel to hanging one and was not able to reproduce it outside of Lima.

arixmkii avatar Feb 23 '25 22:02 arixmkii

FWIW, https://github.com/git-for-windows/git/issues/5199 may be relevant. Note that git-for-Windows picked up the fixes, but as far as I know it's not in upstream cygwin/msys2 yet.

https://github.com/actions/runner-images/commit/fcc4cdb1d095af1317859c4809364538953b3497 linked above shows that Git for Windows was updated to 2.47.0.windows.1 which would be an affected version. But right now it shows 2.47.1.windows.2 which should be fixed (so some runs may be succeeding).

mook-as avatar Feb 26 '25 19:02 mook-as

@mook-as Thank you! I tested with the updated runner (I'm using server 2025, but this should not really behave differently here) with git version 2.47.1.windows.2 and it passed tests https://github.com/arixmkii/qcw/actions/runs/13552437877/job/37879648090

arixmkii avatar Feb 26 '25 20:02 arixmkii