build icon indicating copy to clipboard operation
build copied to clipboard

Release CI centos7-x64 machines

Open BethGriggs opened this issue 4 years ago • 8 comments

At the moment, we appear to have just one machine (release-digitalocean-centos7-x64-1) that can service release jobs with the labels centos7-64-gcc6, centos7-64-gcc8, centos7-release-64, and centos7-release-sources.

The centos7-64-gcc* jobs are pretty quick, < 5 minutes . But the centos7-release-sources trend shows some releases are taking 2+ hours. From looking through through all of the 2+ hour ones they're all v8-canary. Other release lines seem to be more like ~40minutes.

Raising as we experienced a little bottleneck with the last security releases. The total job time across all platforms was:

  • #6476 | 4 hr 4 min (v12.19.1)
  • #6475 | 3 hr 27 min (v14.15.1)
  • #6474 | 2 hr 49 min (v15.2.1)

(Although, on typical release days when we just do the one release or multiple releases but across timezones, this is less likely to be an issue.)

In hindsight, I think the problem may have been that the v15.x release was waiting behind the in progress v8-canary release sources job (~2 hours), and that had a knock on effect for the other release lines. Nonetheless, I thought i'd ask if there was any capacity to add another machine? It still appears to me that we have a lot of release jobs tied to that one machine, and maybe adding another for redundancy may be appropriate?

BethGriggs avatar Nov 19 '20 00:11 BethGriggs

OK, so the easiest initial answer to this is that you, as a releaser of "official" versions, should have permission to just cancel the automatic builds if they get in the way. Nobody is going to die if a nightly doesn't go out. The v8-canary builds are especially costly because they have trouble leveraging ccache so those particularly are cancellable if they get in your way, I doubt anyone actually regularly uses the binaries anyway! I would think that nightlies should be pretty quick so maybe they deserve a bit more patience?

We were discussing on IRC that maybe introducing a multi-tier ccache would be a good idea here. Put v8-canary and nightly on a different ccache, maybe they get their own even, and the regular release lines in another. With so many v8-canary builds going through, the cache is going to have a hard time keeping the actually important stuff around for quick builds. But of course this is going to take some effort to get working .. by someone who has the time.

rvagg avatar Nov 20 '20 04:11 rvagg

Was just looking at the https://ci-release.nodejs.org/job/iojs+release/nodes=centos7-release-sources/ job and I think we're not actually using ccache. The job follows this pattern:

# we need ./configure run and it's not in tar-upload (yet) so let's make
# a binary tarball to get all set up
exec_cmd="gcc --version; make -j $JOBS binary \
  DESTCPU=\"$DESTCPU\" \
  ARCH=\"$ARCH\" \
  DISTTYPE=\"$disttype\" \
  DATESTRING=\"$datestring\" \
  COMMIT=\"$commit\" \
  CUSTOMTAG=\"$CUSTOMTAG\" \
  RELEASE_URLBASE=\"$RELEASE_URLBASE\" \
  CONFIG_FLAGS=\"$CONFIG_FLAGS\" \
"
if [[ "$nodes" =~ centos7 ]]; then
  exec_cmd=". /opt/rh/devtoolset-6/enable; $exec_cmd"
fi
bash -c "$exec_cmd"

. /opt/rh/devtoolset-6/enable will prepend the path to the gcc 6 binaries to the PATH environment variable so that it comes before the ccache path.

Perhaps the job should be using select-compiler.sh instead of directly enabling the devtoolset? We'd have to add a case for nodes=centos7-release-sources. Also it looks like the current script only enables ccache on the IBM platforms (i.e. it does not on x64 or arm): https://github.com/nodejs/build/blob/c6d76d0bdf38b0f5997c002fa72bc21a7ee02ca6/jenkins/scripts/select-compiler.sh#L130-L154

I'll think this through some more.

richardlau avatar Dec 07 '20 12:12 richardlau

We now use select-compiler.sh (which is enabling ccache) for centos7-release-sources.

Now that Node.js 10 has gone End-of-Life I believe the centos6 release machines are no longer being used so we could repurpose one of those as an additional centos7-x64 release machine.

richardlau avatar Jun 11 '21 00:06 richardlau

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

github-actions[bot] avatar Apr 08 '22 00:04 github-actions[bot]

We're now building master/Node.js 18 on RHEL 8 instead of CentOS 7. We currently have three release machines capable of building x64 and the release sources (one at Digital Ocean and two at IBM Cloud). This might be a slight overprovision (I wasn't initially sure how difficult it was going to be to get RHEL 8 onto a DO droplet but it turned out to be very simple) so we could repurpose one of the IBM Cloud RHEL 8 x64 release instances to have an additional CentOS 7 x64 builder but we're now only using those for Node.js 17, 16, 14 and 12 with 12 and 17 soon to be End-of-Life. V8 canary builds are now building on the RHEL 8 machines.

The enablement of ccache also reduced the build times on the existing CentOS 7 release machine.

richardlau avatar Apr 08 '22 14:04 richardlau

Is it expected/intentional that the rhel8-release-sources job is not using ccache?

BethGriggs avatar Apr 11 '22 19:04 BethGriggs

Is it expected/intentional that the rhel8-release-sources job is not using ccache?

It is using ccache. I'm logged in right now to release-ibm-rhel8-x64-1 and can see the cache misses going up. On the RHEL 8 machines we're invoking ccache via the gcc/g++ symlinks in /usr/lib64/ccache rather than explicitly setting CC/CXX to ccache ....

[iojs@release-ibm-rhel8-x64-1 ~]$ ccache -s
cache directory                     /home/iojs/.ccache
primary config                      /home/iojs/.ccache/ccache.conf
secondary config      (readonly)    /etc/ccache.conf
stats updated                       Mon Apr 11 14:41:37 2022
cache hit (direct)                126305
cache hit (preprocessed)            8330
cache miss                         28002
cache hit rate                     82.78 %
called for link                      699
called for preprocessing             207
compiler produced empty output        69
ccache internal error                704
cache file missing                   704
no input file                        238
cleanups performed                     0
files in cache                     74739
cache size                           3.5 GB
max cache size                       5.0 GB
[iojs@release-ibm-rhel8-x64-1 ~]$ ccache -s
cache directory                     /home/iojs/.ccache
primary config                      /home/iojs/.ccache/ccache.conf
secondary config      (readonly)    /etc/ccache.conf
stats updated                       Mon Apr 11 14:42:00 2022
cache hit (direct)                126305
cache hit (preprocessed)            8330
cache miss                         28005
cache hit rate                     82.78 %
called for link                      699
called for preprocessing             207
compiler produced empty output        69
ccache internal error                704
cache file missing                   704
no input file                        238
cleanups performed                     0
files in cache                     74745
cache size                           3.5 GB
max cache size                       5.0 GB
[iojs@release-ibm-rhel8-x64-1 ~]$

richardlau avatar Apr 11 '22 19:04 richardlau

Oh, it wasn't obvious from the job output (comparing to how it's called in other jobs). I was just suspicious as all other platforms took 10-30 minute mark whereas that one was almost 2 hours.

BethGriggs avatar Apr 11 '22 21:04 BethGriggs

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

github-actions[bot] avatar Feb 06 '23 00:02 github-actions[bot]