live-bootstrap icon indicating copy to clipboard operation
live-bootstrap copied to clipboard

[WIP] Run CI on QEMU

Open fosslinux opened this issue 2 months ago • 13 comments

fosslinux avatar Oct 15 '25 00:10 fosslinux

Ok im confused :\

fosslinux avatar Oct 18 '25 10:10 fosslinux

hmm, that last run provides a clue: [ 21.213595] serial8250: too much work for irq4

Linux serial console bug?

Googulator avatar Oct 21 '25 17:10 Googulator

Maybe this well help?

fosslinux avatar Oct 22 '25 04:10 fosslinux

There's a 6h time limit on GitHub jobs. If it goes past that, it cancels the job. You guys probably already noticed that, but I thought it wouldn't hurt to say it.

alganet avatar Oct 22 '25 21:10 alganet

Indeed, that is something that we will deal with after this first issue is fixed.

fosslinux avatar Oct 22 '25 22:10 fosslinux

@fosslinux what is the current issue?

alganet avatar Oct 24 '25 03:10 alganet

This is a good example. https://github.com/fosslinux/live-bootstrap/actions/runs/18679236148/job/53256256615 Note how just midway through the first musl build after the new Linux kernel is run, QEMU suddenly dies? It's very unclear why this is happening. Note that the serial8250: too much work for irq4 messages only appear sometimes.

Sometimes QEMU won't die but just hang instead.

fosslinux avatar Oct 24 '25 21:10 fosslinux

@fosslinux do we need -serial for something or was it just an attempt at working out the problem or getting info on it?

I think these couple of run shows promise:

https://github.com/fosslinux/live-bootstrap/actions/runs/18630275708/job/53113976191 ("another test") https://github.com/fosslinux/live-bootstrap/actions/runs/18623287945/job/53097415539 ("jasdklfjasdklfj")

They were both cancelled by the time limit, which can be seen on their summary page:

image

GitHub recommends this approach instead of sudo. I don't think it's related to the instabilities, but it might be a better way to enable kvm than running as root.

I was attempting to do the exact same thing in my personal fork months ago. Here are some of my runs:

https://github.com/alganet/live-bootstrap/actions (look for the ones with qemu on the title).

At the time, I was consistently getting it to run up until the 6h limit.

alganet avatar Oct 24 '25 23:10 alganet

@fosslinux do we need -serial for something or was it just an attempt at working out the problem or getting info on it?

Not needed, that was just a test.

I think these couple of run shows promise:

https://github.com/fosslinux/live-bootstrap/actions/runs/18630275708/job/53113976191 ("another test") https://github.com/fosslinux/live-bootstrap/actions/runs/18623287945/job/53097415539 ("jasdklfjasdklfj")

Unfortunately they don't either. That is the other case where the build hangs, look at the logs, the build process dies at a similar point. See the two hour gap between the last command output and the job being terminated? Same problem.

I was attempting to do the exact same thing in my personal fork months ago. Here are some of my runs:

https://github.com/alganet/live-bootstrap/actions (look for the ones with qemu on the title).

At the time, I was consistently getting it to run up until the 6h limit.

Thanks, thats helpful, I will inspect the differences.

fosslinux avatar Oct 26 '25 03:10 fosslinux

@fosslinux

I see, you're right. For the first run, the cancelling happens after it hangs for a while without any output

image

However, for the second run "jasdklfjasdklfj", the timestamps on the GitHub raw logs indicate that qemu was producing output just before the process reached the time limit:

image

Unfortunatelly, the logs for my runs are not available anymore. One similar thing between my runs and the "jasdklfjasdklfj" run here is that both were invoking qemu outside python's subprocess.run. I don't know if that is related (seems unlikely), but I think is worth some job retries to see if it's consistent.

I'll probably do that test in my fork tomorrow, using the exact code from "jasdklfjasdklfj". If I find it to be consistent, I'll investigate some subprocess.run optional parameters or maybe an alternative to it. I will also report back if I discover that any of the runs hang.

alganet avatar Oct 26 '25 05:10 alganet

New clue: "cat: write error: Resource temporarily unavailable". And then the output of "cat" is seemingly truncated.

Unfortunately, the run time of qemu suggests that it still likely failed at the exact same point as previous runs, suggesting that the serial console IRQ error is unrelated.

Googulator avatar Oct 26 '25 18:10 Googulator

"jasdklfjasdklfj" has a bug, which is that the build process runs twice, that is why we see output right up until the time limit, because that is the second run of the build process.

fosslinux avatar Oct 27 '25 02:10 fosslinux

I noticed some mounts are failing. This doesn't seem to happen on the bubblewrap version.

image image

I am not familiar with that part. If it's not related and an issue for later, just ignore me :D

alganet avatar Oct 27 '25 09:10 alganet