setup-node
setup-node copied to clipboard
Question: How to properly cache node-gyp builds?
Hello,
Description:
I'm struggling to figure out how to properly cache node-gyp builds. This adds +30s to my yarn install, by rebuilding them every time yarn install runs.
After manually caching and restoring the build paths, it still triggers a new gyp build every yarn install. The paths I cached:
# Cache location of node headers
~/.cache/node-gyp
# Explicit caching of the affected packages
node_modules/cpu-features/build
node_modules/unix-dgram/build
My previous question in the node-gyp repo lead me here. The suggestion was to look into node-gyp-build.
Any advice would be appreciated.
Justification:
Save myself time and the planet by not wasting electricity.
Are you willing to submit a PR?
Of course.
Hello @arminrosu. I think for that case is better to use actions/cache because the setup-node saves only global cache. Through actions/cache you can specify primary/restore keys and paths for caching.
@dmitry-shibanov thanks, that's how I did it. I asked the question here because this repo is concerned with node, whereas actions/cache is a generic action. Was hoping the team behind the actions was sharing knowledge maybe.
It's quite disappointing and surprising to find that https://github.com/marketplace/actions/yarn-install-cache was deprecated in favour of this (actions/setup-node) and yet this basic functionality of caching node_modules is not covered out of the box. Surely every single developer using yarn with CI needs this?!
So what's the recommended best practice for GitHub actions which caches everything which might need to be cached? https://yarnpkg.com/features/caching#github-actions says:
We're still investigating the exact set of defaults that make GH Action caching more efficient. It's likely that we'll provide an official yarn-cache action mid-term for this purpose.
My best guess is that currently a combination of action/setup-node and action/cache is required, but this really should be documented at the very least, if not automated, so that literally millions of developers don't have to reinvent the same wheel.
I found https://github.com/yarnpkg/berry/discussions/5924 which says that caching .yarn/install-state.gz turns YN0007 errors into YN0008. I guess our work is still not done... :disappointed:
Also there seems to be no way to configure this action to cache .yarn/install-state.gz, short of forking it and extending the functionality :disappointed:
I went down the rabbit hole of caching node_modules with Github Actions cache. The issue is that node_modules was so massive in my repo (like 900mb cached?) and Github Actions has a cache limit of 10gb per repo. This meant that 10-12 CI runs would LRU eject from the cache, and this would break other crucial caches that CI would rely on. It's not really worth it.
The best I ever did was to pre-build a Docker image with a fairly recent yarn install:
# ╭──────────────────────────────────────────────────────────╮
# │ Stage 1: Create a base image from node18 + alpine │
# │ with the packages we need. │
# ╰──────────────────────────────────────────────────────────╯
# The --platform flag is required when building on Apple Silicon for the Github
# Action runners.
FROM --platform=linux/amd64 node:18.17.0-alpine3.18 AS base
# Set up some core ENV variables for yarn install + playwright
ENV PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 \
YARN_ENABLE_GLOBAL_CACHE=false \
YARN_NM_MODE=hardlinks-local \
YARN_CACHE_FOLDER=.yarn/cache
# Create a blank directory to store artifacts in.
RUN mkdir -p /build-artifacts
# ╭──────────────────────────────────────────────────────────╮
# │ Stage 2: Build node_modules + yarn in an initial │
# │ builder container │
# ╰──────────────────────────────────────────────────────────╯
FROM base AS builder
RUN mkdir -p /__w/my-repo/my-repo
WORKDIR /__w/my-repo/my-repo
# Copy the whole git repo over and yarn install, it's the easiest option.
COPY . ./
RUN yarn install --immutable
# To reduce yarn install times in CI, we need to copy over the following:
# - all the node_modules folders
# - .yarn/install-state.gz
# - the NPM global cache folder (whatever `npm config get cache` is)
#
# We _don't_ need to copy over `.yarn/cache`, because we already commit that to
# the Git repo as part of yarn zero-installs.
# Recursively tar up the node_modules directory
RUN fd -0 -t d node_modules | tar --zstd -cf /build-artifacts/node_modules_archive.tar.zst --null -T -
# Copy over the yarn install state
RUN cp .yarn/install-state.gz /build-artifacts/yarn-install-state.gz
# Copy over the NPM global cache folder
RUN cd $(npm config get cache) && tar --zstd -cf /build-artifacts/npm_global_cache.tar.zst *
# ╭──────────────────────────────────────────────────────────╮
# │ Stage 3: Redo the image with just the build │
# │ artifacts, to keep the size down │
# ╰──────────────────────────────────────────────────────────╯
FROM base
RUN mkdir -p /build-artifacts
COPY --from=builder /build-artifacts/* /build-artifacts
Then I created a Github action I could reuse in jobs in .github/actions/fast-yarn-install/action.yml
name: "fast monorepo yarn install"
description: |
Our base CI image contains prebaked npm + yarn + node_modules caches inside
an artifacts directory.
This shared action will:
- Set up all the caches from the artifacts directory
- Run `yarn install --immutable` to resolve any drift
This action _will_ get slower over time as we add more packages to yarn, so
rebuilding the base CI image every so often to resolve package drift is
advised.
runs:
using: composite
steps:
- name: Find the NPM global cache directory
id: npm-config
shell: bash
run: |
echo "NPM_GLOBAL_CACHE_FOLDER=$(npm config get cache)" >> $GITHUB_OUTPUT
- name: Move yarn install state into place
shell: bash
run: |
mv /build-artifacts/yarn-install-state.gz .yarn/install-state.gz
- name: Unpack npm global cache
shell: bash
run: |
mkdir -p "${{ steps.npm-config.outputs.NPM_GLOBAL_CACHE_FOLDER }}"
tar xf /build-artifacts/npm_global_cache.tar.zst -C "${{ steps.npm-config.outputs.NPM_GLOBAL_CACHE_FOLDER }}"
- name: Unpack recursive node_modules cache directly into the monorepo
shell: bash
run: |
tar xf /build-artifacts/node_modules_archive.tar.zst -C .
- name: Run yarn install
shell: bash
run: |
yarn install --immutable --inline-builds
env:
# Use local cache folder to keep downloaded archives
YARN_ENABLE_GLOBAL_CACHE: "false"
# Reduce node_modules size
YARN_NM_MODE: "hardlinks-local"
# Ensure we're using the local monologue cache
YARN_CACHE_FOLDER: ".yarn/cache"
and referenced it in jobs like:
- name: yarn install
uses: ./.github/actions/fast-yarn-install