build icon indicating copy to clipboard operation
build copied to clipboard

Release builds stuck on rhel8-s390x-release

Open targos opened this issue 3 years ago • 46 comments

https://ci-release.nodejs.org/job/iojs+release/

targos avatar Jan 24 '23 12:01 targos

The reason is https://ci-release.nodejs.org/job/iojs+release/9088/nodes=rhel8-s390x-release/ has been running for 3 days (almost four days), tying up the release machine. I've just canceled it.

richardlau avatar Jan 24 '23 12:01 richardlau

Happened again today on https://ci-release.nodejs.org/job/iojs+release/9146/nodes=rhel8-s390x-release/ -- stuck for 7+ hours. I've canceled the job.

richardlau avatar Feb 14 '23 17:02 richardlau

Happened during the 19.6.1 build: https://ci-release.nodejs.org/job/iojs+release/9153/nodes=rhel8-s390x-release/

richardlau avatar Feb 16 '23 20:02 richardlau

I just cancelled https://ci-release.nodejs.org/job/iojs+release/9175/nodes=rhel8-s390x-release/

targos avatar Feb 25 '23 09:02 targos

Canceled https://ci-release.nodejs.org/job/iojs+release/9214/nodes=rhel8-s390x-release/ (running for 10hrs).

richardlau avatar Mar 17 '23 16:03 richardlau

And again https://ci-release.nodejs.org/job/iojs+release/9249/nodes=rhel8-s390x-release/

richardlau avatar Apr 03 '23 15:04 richardlau

And again: https://ci-release.nodejs.org/job/iojs+release/9270/nodes=rhel8-s390x-release/

targos avatar Apr 09 '23 07:04 targos

Again: https://ci-release.nodejs.org/job/iojs+release/9290/nodes=rhel8-s390x-release/

targos avatar Apr 17 '23 06:04 targos

https://ci-release.nodejs.org/job/iojs+release/9297/nodes=rhel8-s390x-release/ was another one.

The really odd thing about this is that, AFAIK, this hasn't happened on the test machines.

richardlau avatar Apr 18 '23 11:04 richardlau

https://ci-release.nodejs.org/job/iojs+release/9332/ https://ci-release.nodejs.org/job/iojs+release/9334/

targos avatar May 03 '23 15:05 targos

Again - https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9384/

mhdawson avatar May 31 '23 16:05 mhdawson

https://ci-release.nodejs.org/job/iojs+release/9406/nodes=rhel8-s390x-release/

richardlau avatar Jun 08 '23 12:06 richardlau

https://ci-release.nodejs.org/job/iojs+release/9411/nodes=rhel8-s390x-release/

richardlau avatar Jun 10 '23 15:06 richardlau

https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9452/

richardlau avatar Jun 26 '23 12:06 richardlau

https://ci-release.nodejs.org/job/iojs+release/9455/nodes=rhel8-s390x-release/

richardlau avatar Jun 26 '23 16:06 richardlau

https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9468/

richardlau avatar Jul 03 '23 14:07 richardlau

https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9471/console

richardlau avatar Jul 04 '23 13:07 richardlau

https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9486/

richardlau avatar Jul 10 '23 17:07 richardlau

https://ci-release.nodejs.org/job/iojs+release/9487/nodes=rhel8-s390x-release/

richardlau avatar Jul 11 '23 12:07 richardlau

https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9500/

richardlau avatar Jul 17 '23 14:07 richardlau

https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9508

BethGriggs avatar Jul 20 '23 17:07 BethGriggs

Again, it seems more frequent atm: https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9515/

BethGriggs avatar Jul 25 '23 10:07 BethGriggs

FWIW these are the running processes on the release machine:

$ ps -afu iojs
UID          PID    PPID  C STIME TTY          TIME CMD
iojs     2144511       1  0 Jul11 ?        00:57:17 /usr/lib/jvm/java-17-openjdk-17.0.7.0.7-3.el8.s390x/bin/java -Xmx128m -jar /home/iojs/slave.jar -jnlpUrl https://ci-release.nodejs.org/computer/release-ibm-r
iojs     3765790 2144511  0 Jul21 ?        00:00:00 /bin/bash -ex /tmp/jenkins15187592510881137259.sh
iojs     3765814 3765790  0 Jul21 ?        00:00:00 bash -c gcc --version; make -j 2 binary-upload   DESTCPU="s390x"   ARCH="s390x"   DISTTYPE="custom"   DATESTRING="20230721"   COMMIT="4cb3751743"   CUSTOMTAG
iojs     3765816 3765814  0 Jul21 ?        00:00:00 make -j 2 binary-upload DESTCPU=s390x ARCH=s390x DISTTYPE=custom DATESTRING=20230721 COMMIT=4cb3751743 CUSTOMTAG=v8-canary202307214cb3751743 RELEASE_URLBASE=
iojs     3765965 3765816  0 Jul21 ?        00:00:00 make install DESTDIR=node-v21.0.0-v8-canary202307214cb3751743-linux-s390x V=0 PORTABLE=1
iojs     3766012 3765965  0 Jul21 ?        00:00:00 make -C out BUILDTYPE=Release V=0
iojs     3774158 3766012  0 Jul21 ?        00:00:00 [touch] <defunct>
iojs     3774159 3766012  0 Jul21 ?        00:00:00 [sh] <defunct>
linux1   3803507 3803443  0 07:51 pts/0    00:00:00 ps -afu iojs
$

I think the issue is the <defunct> processes which are zombie process -- processes that are dead but the parent process (in this case make) hasn't destroyed. Maybe it's a bug with make, but we're running the same version on both the release and test machines and AFAIK we've never seen the test builds hang in this way.

richardlau avatar Jul 25 '23 12:07 richardlau

FWIW I saw there were some Java updates available and a new kernel (I don't think they should affect these hangs 🤷) so I ran the Ansible playbook against the release machine and then rebooted it.

richardlau avatar Jul 25 '23 12:07 richardlau

Just to keep track, again on https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9530/

BethGriggs avatar Jul 31 '23 21:07 BethGriggs

https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9565/

richardlau avatar Aug 22 '23 03:08 richardlau

https://ci-release.nodejs.org/job/iojs+release/9579/nodes=rhel8-s390x-release/

richardlau avatar Aug 23 '23 15:08 richardlau

https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9586/ https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9592/

richardlau avatar Aug 29 '23 11:08 richardlau

https://ci-release.nodejs.org/job/iojs+release/9599/nodes=rhel8-s390x-release/

richardlau avatar Aug 30 '23 12:08 richardlau

https://ci-release.nodejs.org/job/iojs+release/nodes=rhel8-s390x-release/9606/

richardlau avatar Aug 31 '23 16:08 richardlau