Release builds stuck on rhel8-s390x-release
https://ci-release.nodejs.org/job/iojs+release/
The reason is https://ci-release.nodejs.org/job/iojs+release/9088/nodes=rhel8-s390x-release/ has been running for 3 days (almost four days), tying up the release machine. I've just canceled it.
Happened again today on https://ci-release.nodejs.org/job/iojs+release/9146/nodes=rhel8-s390x-release/ -- stuck for 7+ hours. I've canceled the job.
Happened during the 19.6.1 build: https://ci-release.nodejs.org/job/iojs+release/9153/nodes=rhel8-s390x-release/
I just cancelled https://ci-release.nodejs.org/job/iojs+release/9175/nodes=rhel8-s390x-release/
Canceled https://ci-release.nodejs.org/job/iojs+release/9214/nodes=rhel8-s390x-release/ (running for 10hrs).
And again https://ci-release.nodejs.org/job/iojs+release/9249/nodes=rhel8-s390x-release/
And again: https://ci-release.nodejs.org/job/iojs+release/9270/nodes=rhel8-s390x-release/
Again: https://ci-release.nodejs.org/job/iojs+release/9290/nodes=rhel8-s390x-release/
https://ci-release.nodejs.org/job/iojs+release/9297/nodes=rhel8-s390x-release/ was another one.
The really odd thing about this is that, AFAIK, this hasn't happened on the test machines.
https://ci-release.nodejs.org/job/iojs+release/9332/ https://ci-release.nodejs.org/job/iojs+release/9334/
Again - https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9384/
https://ci-release.nodejs.org/job/iojs+release/9406/nodes=rhel8-s390x-release/
https://ci-release.nodejs.org/job/iojs+release/9411/nodes=rhel8-s390x-release/
https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9452/
https://ci-release.nodejs.org/job/iojs+release/9455/nodes=rhel8-s390x-release/
https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9468/
https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9471/console
https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9486/
https://ci-release.nodejs.org/job/iojs+release/9487/nodes=rhel8-s390x-release/
https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9500/
https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9508
Again, it seems more frequent atm: https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9515/
FWIW these are the running processes on the release machine:
$ ps -afu iojs
UID PID PPID C STIME TTY TIME CMD
iojs 2144511 1 0 Jul11 ? 00:57:17 /usr/lib/jvm/java-17-openjdk-17.0.7.0.7-3.el8.s390x/bin/java -Xmx128m -jar /home/iojs/slave.jar -jnlpUrl https://ci-release.nodejs.org/computer/release-ibm-r
iojs 3765790 2144511 0 Jul21 ? 00:00:00 /bin/bash -ex /tmp/jenkins15187592510881137259.sh
iojs 3765814 3765790 0 Jul21 ? 00:00:00 bash -c gcc --version; make -j 2 binary-upload DESTCPU="s390x" ARCH="s390x" DISTTYPE="custom" DATESTRING="20230721" COMMIT="4cb3751743" CUSTOMTAG
iojs 3765816 3765814 0 Jul21 ? 00:00:00 make -j 2 binary-upload DESTCPU=s390x ARCH=s390x DISTTYPE=custom DATESTRING=20230721 COMMIT=4cb3751743 CUSTOMTAG=v8-canary202307214cb3751743 RELEASE_URLBASE=
iojs 3765965 3765816 0 Jul21 ? 00:00:00 make install DESTDIR=node-v21.0.0-v8-canary202307214cb3751743-linux-s390x V=0 PORTABLE=1
iojs 3766012 3765965 0 Jul21 ? 00:00:00 make -C out BUILDTYPE=Release V=0
iojs 3774158 3766012 0 Jul21 ? 00:00:00 [touch] <defunct>
iojs 3774159 3766012 0 Jul21 ? 00:00:00 [sh] <defunct>
linux1 3803507 3803443 0 07:51 pts/0 00:00:00 ps -afu iojs
$
I think the issue is the <defunct> processes which are zombie process -- processes that are dead but the parent process (in this case make) hasn't destroyed. Maybe it's a bug with make, but we're running the same version on both the release and test machines and AFAIK we've never seen the test builds hang in this way.
FWIW I saw there were some Java updates available and a new kernel (I don't think they should affect these hangs 🤷) so I ran the Ansible playbook against the release machine and then rebooted it.
Just to keep track, again on https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9530/
https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9565/
https://ci-release.nodejs.org/job/iojs+release/9579/nodes=rhel8-s390x-release/
https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9586/ https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9592/
https://ci-release.nodejs.org/job/iojs+release/9599/nodes=rhel8-s390x-release/
https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9606/