build
build copied to clipboard
node-test-commit-arm is broken
It seems that node-test-commit-arm jobs never completes. This affects all PR CIs:

Unfortunately, the "Auto start CI" GitHub action is unaware of the issue and just silently removes the request-ci label while queuing even more jobs.
I have brought one of the arm machines back online and its seems to be processing jobs.
FWIW it looks like the bottleneck was ubuntu2004-armv7l. In addition to the machine that was offline, the container on the other machine appears to have gone down on 15 Sept:
root@test-equinix-ubuntu2004-docker-arm64-3:~# systemctl status jenkins-test-equinix-ubuntu2004_container-armv7l-2.service
● jenkins-test-equinix-ubuntu2004_container-armv7l-2.service - Jenkins Slave in Docker for test-equinix-ubuntu2004_container-armv7l-2
Loaded: loaded (/lib/systemd/system/jenkins-test-equinix-ubuntu2004_container-armv7l-2.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-07-26 12:27:04 UTC; 2 months 7 days ago
Main PID: 3048135 (docker)
Tasks: 113 (limit: 616176)
Memory: 38.3M
CGroup: /system.slice/jenkins-test-equinix-ubuntu2004_container-armv7l-2.service
└─3048135 /usr/bin/docker run --init --rm -v /home/iojs/test-equinix-ubuntu2004_container-armv7l-2/:/home/iojs -v /home/iojs/.ccache/:/home/io>
Sep 15 00:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: Sep 15, 2022 12:46:45 AM hudson.slaves.ChannelPinger$1 onDead
Sep 15 00:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: INFO: Ping failed. Terminating the channel JNLP4-connect connection to ci.nodejs.or>
Sep 15 00:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: java.util.concurrent.TimeoutException: Ping started at 1663199205593 hasn't complet>
Sep 15 00:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: at hudson.remoting.PingThread.ping(PingThread.java:134)
Sep 15 00:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: at hudson.remoting.PingThread.run(PingThread.java:90)
Sep 15 01:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: Sep 15, 2022 1:46:45 AM hudson.slaves.ChannelPinger$1 onDead
Sep 15 01:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: INFO: Ping failed. Terminating the channel JNLP4-connect connection to ci.nodejs.or>
Sep 15 01:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: java.util.concurrent.TimeoutException: Ping started at 1663202805597 hasn't complet>
Sep 15 01:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: at hudson.remoting.PingThread.ping(PingThread.java:134)
Sep 15 01:46:45 test-equinix-ubuntu2004-docker-arm64-3 docker[3048135]: at hudson.remoting.PingThread.run(PingThread.java:90)
I've restarted the container, so we now have the two back online.
I have brought one of the arm machines back online and its seems to be processing jobs.
I'm guessing this was another occurrence of https://github.com/nodejs/build/issues/2894?
Thank you @mhdawson and @richardlau!
the container on the other machine appears to have gone down on 15 Sept:
Hmmm that's interesting if it was just the container that failed.
Looks like all ubuntu2004-armv7l containers are offline:

root@test-equinix-ubuntu2004-docker-arm64-3:~# systemctl status jenkins-test-equinix-ubuntu2004_container-armv7l-2.service
● jenkins-test-equinix-ubuntu2004_container-armv7l-2.service - Jenkins Slave in Docker for test-equinix-ubuntu2004_container-armv7l-2
Loaded: loaded (/lib/systemd/system/jenkins-test-equinix-ubuntu2004_container-armv7l-2.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2022-10-02 11:55:32 UTC; 1 day 20h ago
Main PID: 2538938 (docker)
Tasks: 104 (limit: 616176)
Memory: 28.9M
CGroup: /system.slice/jenkins-test-equinix-ubuntu2004_container-armv7l-2.service
└─2538938 /usr/bin/docker run --init --rm -v /home/iojs/test-equinix-ubuntu2004_container-armv7l-2/:/home/iojs -v /home/iojs/.ccache/:/home/io>
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: Oct 03, 2022 9:40:34 AM hudson.slaves.ChannelPinger$1 onDead
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: INFO: Ping failed. Terminating the channel JNLP4-connect connection to ci.nodejs.org/107.170.240.62:41913.
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: java.util.concurrent.TimeoutException: Ping started at 1664786434284 hasn't completed by 1664790034284
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.ping(PingThread.java:132)
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.run(PingThread.java:88)
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: Oct 03, 2022 10:40:34 AM hudson.slaves.ChannelPinger$1 onDead
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: INFO: Ping failed. Terminating the channel JNLP4-connect connection to ci.nodejs.org/107.170.240.62:41913.
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: java.util.concurrent.TimeoutException: Ping started at 1664790034286 hasn't completed by 1664793634287
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.ping(PingThread.java:132)
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.run(PingThread.java:88)
root@test-equinix-ubuntu2004-docker-arm64-3:~# journalctl -u jenkins-test-equinix-ubuntu2004_container-armv7l-2.service
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: WARNING: [JNLP4-connect connection to ci.nodejs.org/107.170.240.62:41913]
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: java.lang.NullPointerException
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.util.DirectByteBufferPool.acquire(DirectByteBuffe>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.IOHub.acquire(IOHub.java:164)
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.ProtocolStack.acquire(ProtocolStack.java>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SS>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngi>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.j>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:13>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$2200(BIONetw>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetwo>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:126)
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at java.lang.Thread.run(Thread.java:748)
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: Oct 03, 2022 7:40:10 AM org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: SEVERE: [JNLP4-connect connection to ci.nodejs.org/107.170.240.62:41913] Reader thr>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: java.lang.NullPointerException
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.util.DirectByteBufferPool.acquire(DirectByteBuffe>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.IOHub.acquire(IOHub.java:164)
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.ProtocolStack.acquire(ProtocolStack.java>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SS>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngi>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.j>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:13>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$2200(BIONetw>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetwo>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja>
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:126)
Oct 03 07:40:10 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at java.lang.Thread.run(Thread.java:748)
Oct 03 08:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: Oct 03, 2022 8:40:34 AM hudson.slaves.ChannelPinger$1 onDead
Oct 03 08:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: INFO: Ping failed. Terminating the channel JNLP4-connect connection to ci.nodejs.or>
Oct 03 08:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: java.util.concurrent.TimeoutException: Ping started at 1664782834280 hasn't complet>
Oct 03 08:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.ping(PingThread.java:132)
Oct 03 08:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.run(PingThread.java:88)
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: Oct 03, 2022 9:40:34 AM hudson.slaves.ChannelPinger$1 onDead
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: INFO: Ping failed. Terminating the channel JNLP4-connect connection to ci.nodejs.or>
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: java.util.concurrent.TimeoutException: Ping started at 1664786434284 hasn't complet>
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.ping(PingThread.java:132)
Oct 03 09:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.run(PingThread.java:88)
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: Oct 03, 2022 10:40:34 AM hudson.slaves.ChannelPinger$1 onDead
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: INFO: Ping failed. Terminating the channel JNLP4-connect connection to ci.nodejs.or>
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: java.util.concurrent.TimeoutException: Ping started at 1664790034286 hasn't complet>
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.ping(PingThread.java:132)
Oct 03 10:40:34 test-equinix-ubuntu2004-docker-arm64-3 docker[2538938]: at hudson.remoting.PingThread.run(PingThread.java:88)
lines 7614
I've restarted the ubuntu2004-armv7l container on test-equinix-ubuntu2004-docker-arm64-3.
Hmmm - so this is ONLY affecting the armv7l container on the host?
AFAICT at the moment just the ubuntu2004 armv7l containers. The debian10 armv7l containers were still working.
It looks like the NullPointerException has happened again and I'm not sure if the agent on that container is now broken again (https://ci.nodejs.org/job/node-test-commit-arm/nodes=ubuntu2004-armv7l/44015/console looks stuck).
I've restarted the container on the other Altra -- will see if the problem reoccurs while watching (virtually) the rest of this morning's NodeConf EU talks. Might trying updating the containers this afternoon to Java 11/17 (currently on 8).
I'm seeing this immediately after resuming CI:
fatal: Couldn't find remote ref refs/heads/jenkins-node-test-commit-arm-fanned-feaab78d411c319cbbe153db7d04e9b46b2b48eb-binary-pi1p/cc-armv7
@tniessen jenkins-node-test-commit-arm-fanned-feaab78d411c319cbbe153db7d04e9b46b2b48eb-binary-pi1p/cc-armv7 was deleted by https://ci.nodejs.org/job/git-delete-branches/58878/ because the https://ci.nodejs.org/job/node-test-commit-arm-fanned/27642/console job from https://ci.nodejs.org/job/node-test-pull-request/46975/ succeeded.
@richardlau I am not sure I understand what I did wrong; this is happening in two PRs and I am pretty sure I only resumed/restarted failing builds. I guess I'll start new builds and see if that works.
@tniessen Arm builds were backlogged heavily over the weekend due to the machines being offline and then catching up the already queued jobs. https://github.com/nodejs/node/pull/44849 appears to have at least two builds which ended up racing each other:
- https://ci.nodejs.org/job/node-test-pull-request/46975/
- https://ci.nodejs.org/job/node-test-pull-request/46992/

The arm build from https://ci.nodejs.org/job/node-test-pull-request/46975/ succeeded and caused the branch in the binary repo to be deleted while https://ci.nodejs.org/job/node-test-pull-request/46992/ was still in progress and causing the arm build there to fail. You've been resuming builds from https://ci.nodejs.org/job/node-test-pull-request/46992/ so the resumed builds think they need to resume the failed arm build but the branch no longer exists.
I could easily see this happening in other PRs during the backlog if builds had started for the same commit while earlier builds for that commit were still waiting for the offline arm machines to come back online.
Ah, thanks for the clarification @richardlau! That makes a lot of sense :)
I've restarted the container on the other Altra -- will see if the problem reoccurs while watching (virtually) the rest of this morning's NodeConf EU talks. Might trying updating the containers this afternoon to Java 11/17 (currently on 8).
Haven't seen the problem again. FWIW I've updated the containers to run on Java 17 now.