arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[CI][Packaging][Release] Jobs that run on ARM self-hosted runners are flaky and failing with communication lost

Open raulcd opened this issue 1 year ago • 7 comments

Describe the bug, including details regarding any error messages, version, and platform.

The k8s self-hosted runners solution is slightly flaky lately. See for example:

The error:

The self-hosted runner: k8s-runners-linux-arm-8g6tn-gpmc7 lost communication with the server.

I am seeing this happening on the maintenance branch for the release too.

Component(s)

Continuous Integration, Packaging, Release

raulcd avatar Oct 15 '24 12:10 raulcd

cc @assignUser

raulcd avatar Oct 15 '24 12:10 raulcd

Will investigate

assignUser avatar Oct 15 '24 13:10 assignUser

This type of error usually happens when the runner pod gets oom or cpu killed, did we increase the feature set that's build or something like that, that might increase memory or cpu use?

assignUser avatar Oct 15 '24 18:10 assignUser

https://github.com/apache/arrow/pull/44348 may be related. It enables Azure file system.

kou avatar Oct 16 '24 00:10 kou

Can we increase assigned resources for the runner?

kou avatar Oct 16 '24 00:10 kou

Ah yeah that could do it, I'll see what I can do.

assignUser avatar Oct 17 '24 00:10 assignUser

The runner resources where increased, should take effect soon!

assignUser avatar Oct 21 '24 17:10 assignUser

We have moved from self-hosted ARM runners to GitHub hosted runners. We can close this issue now. Thanks for working on this in the past @assignUser

raulcd avatar Feb 13 '25 10:02 raulcd