build-image
build-image copied to clipboard
Better error handling of `nvm` and `nodejs.org/dist`
We run nvm install to check what the latest matching Node.js version is (e.g. NODE_VERSION 12 would match 12.18.3), and download it unless it is available locally:
https://github.com/netlify/build-image/blob/450c24b893036c2813a0b0d0a2a05bcdd81935a4/run-build-functions.sh#L221
Even if the desired Node.js version is already available locally, we still need to use nvm install (and make an HTTP request) to check whether a newer matching Node.js version is available. However, if the command fails because nodejs.org is down, we should default to using a local Node.js version instead, providing it matches the desired Node.js version. At the moment, we fail instead, which created an outage on 2020-08-30.
https://github.com/netlify/build-image/blob/450c24b893036c2813a0b0d0a2a05bcdd81935a4/run-build-functions.sh#L233
Doing exactly what @ehmicky describes would have greatly reduced (perhaps to the extent that nobody noticed - us OR customers) the impact of our incident last night. It exactly describes the root cause of the "outage" which paged several people and wasted a half hour of their night since it was neither caused by us nor actionable by us.
cf #incident-2021-02-19-high-percentage-of-build-failures and https://www.notion.so/netlify/2021-02-18-High-percentage-of-build-failures-880aebbdffa549d28d7316e6ad062487
cc @verythorough this may get upped in priority as a followup from the outage, not sure yet.
From discussing with @fool and @keiko713, we are proposing tweaking the solution above. Instead of always running the default Node.js version, we would instead try to run the same major Node.js version that users are choosing.
From fairly recent data collection, it appears that:
- Almost all of our users are not specifying the Node.js minor/patch version, only the major version
- Most of our users are using the currently supported Node.js major releases, plus the last major release before that.
Based on this, we could:
- [ ] Download the latest version of Node.js 10, 12, 14 and 15 using
nvm, whenDockerfileis built. I.e. those versions would be cached in the build image. - [ ] When a build is using
nvm installthat fails, check the error message (alternative: pinghttps://nodejs.org/dist. Note: I have a library that does just that, which includes automatic retries) to see ifnodejs.org/distis down. - [ ] If it is down, then use the cached Node.js version (or the closest to it).
This would increase the build image uncompressed size by 100MB to 150MB.
Linking for context https://github.com/actions/setup-node#v2.
In v2 setup-node:
- Keep images up to date with latest version of each LTS
- Use their own repository to fetch versions
- Fallback to fetch from
https://nodejs.org/dist - Cache versions locally too
So it seems the suggested solution is to do the same except 2. which sounds good to me
Another option could also be to use the NVM_NODEJS_ORG_MIRROR environment variable which allows adding mirrors.
This would remove any need to cache the Node.js binaries ourselves, by relying on mirrors instead.
Note: this problem also happens ~20 times per day due to transient network errors when nvm is fetching https://nodejs.org/dist/index.json, which would also be solved by this issue.
Draft PR at https://github.com/netlify/build-image/pull/750 for adding a mirror.
#750 was closed due to security concerns with using external mirrors. This means we should instead go with caching the binaries, as explained in the above comments.