rfcs
rfcs copied to clipboard
[RRFC] Create dedicated lockfile for a specific workspace
Motivation ("The Why")
I'm working on a large monorepo with a few dozen packages, using npm workspaces. A lot of these packages are frontend applications and backend services, each of which are deployed as docker containers. To do so, the build process builds a docker image for each of them and has to install the required dependencies in the process. To do so, we copy the root-level package-lock.json
and the individual package.json
of that package to the filesystem of the container and perform a npm installation afterwards.
The interesting part of all of the Dockerfile
looks like this (simplified):
COPY package-lock.json .
COPY service/package.json .
RUN npm ci --audit false --fund false
COPY service/ .
RUN npm run build
This works fine and will only install the dependencies required for this exact package 👍🏻
However, I have more than three dozen workspaces in the monorepo and update all dependencies in all workspaces automatically (think renovate, dependabot etc). This means the root package-lock.json
is updated multiple times a day - which leads to a problem.
Due to the COPY package-lock.json
being one of the first instructions in all Dockerfile
, this file changing multiple times a day prevents us from efficiently using dockers build cache: Every time any workspace updates a dependency, all cached docker build layers after this instruction are invalidated and have to be re-executed from scratch for every workspace package.
Both yarn (userland) and pnpm have tools to generate a specific lockfile for a single workspace out of the hoisted root lockfile.
It should be possible to also do this with npm - but I couldn't find any readymade solutions for it.
What do you think?
Example
It would be great if there was a CLI Command that would generate a package-lock.json
for a specific workspace out of the hoisted root package-lock.json
. In the absence of a better name, let's call it "make-dedicated-lockfile". Running npm make-dedicated-lockfile
in workspace "service" or running npm make-dedicated-lockfile -w service
in the project root would generate a dedicated package-lock.json
in the "service" workspace to reflect the reproducible dependency tree of that specific workspace.
How
Current Behaviour
I need to use package-lock.json
and run npm ci
to ensure that installations remain identical and reproducible throughout the entire dependency tree across all the systems and pipelines, one of them being building a docker image for each workspace.
Right now, this is done with a npm script in each workspace:
docker build .. -f Dockerfile
Desired Behaviour
Before building a docker image for each workspace, generate a dedicated lockfile specific to that workspace. Build the docker image, then delete the specific lockfile.
This could be done with a npm script like this:
npm make-dedicated-lockfile && docker build . -f Dockerfile && rm package-lock.json
The Dockerfile
would then copy over the dedicated lockfile instead of the root level one (please take note of the context switch in the docker
npm script. Context was previously set to ..
(repository/workspace root), and can now be set to .
(workspace folder).
COPY package-lock.json .
COPY package.json .
RUN npm ci --audit false --fund false
COPY / .
RUN npm run build
Additional thoughts
It is possible that you might not want multiple package-lock.json
files in a workspaces project, ever - because it might confuse people. If that's the case, one could think of a few ways to solve this - for example generating a dedicated-lock.json
file instead and adding a command line flag to npm ci
that would need to specifically point to this dedicated lockfile:
npm make-dedicated-lockfile && docker build . -f Dockerfile && rm dedicated-lock.json
COPY dedicated-lock.json .
COPY package.json .
RUN npm ci --audit false --fund false --use-workspace-lockfile
COPY / .
RUN npm run build
I am going to +1 this with my own use-case. I searched for this hoping the issue would already be addressed, but alas.
I think it's relevant to talk about github actions here since github owns npm, and one would assume that one of their goals is tight integration between the two services & toolchains, so I'm going to share my use-case here which I think is entirely reasonable.
You have a frontend
, backend
, and process-tooling
directory in your repo's root. These are walled-off from one another on a dependency basis, and each has a package.json
as the manifest for its particular dependencies. You have some overlap, but frontend has ~30 dependencies, backend has ~50 dependencies, and process-tooling has ~5 dependencies. Ideally, for the parsimony of both download size and complexity of the sub-project, and speed of execution, those are separate dependency lists and those dependency lists generate separate lockfiles which you can use for well-scoped reproducible installs which hit github's cache.
Doing this, you'll save both github & npm bandwidth.
This is possible with manual management of package.json
-containing directories, but not with npm's current implementation of workspaces:
# ... preamble stuff
steps:
- name: Checkout Branch w/ repo history
uses: actions/[email protected]
with:
fetch-depth: 0
path: ~/.npm
- name: Cache node modules
uses: actions/cache@v2
with:
path: ~/.npm
key: ${{ runner.os }}-node-modules-cache-process-tooling-${{ hashFiles('process-tooling/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-modules-cache-process-tooling-
- name: Install (hopefully efficiently)
run: |
npm --prefix process-tooling/ install
# cache will almost certainly be hit since these dependencies rarely change
npm --prefix process-tooling/ run cleanup-pr-comment
# ... execute your tooling & subsequent handling
This is possible with manual management of
package.json
-containing directories, but not with npm's current implementation of workspaces:
I'm not sure I understand. Why would you like to implement this using workspaces?
@codepunkt why do you have to build your workspace (docker) if nothing changed inside your specific workspace?
Whenever a feature PR (pull request) updates a specific workspace, that workspace is being rebuild in CI/CD, including docker images. Building the docker image will install dependencies, which will use the lockfile from repository root.
With workspaces enabled, we only have that single lockfile at repository root, which gets continuously updated by package version update PRs created by tools like renovate, greenkeeper or dependabot.
When a feature PR updating a specific workspace didn't change any package dependencies, the dependency installation step when building the docker image could be skipped and read from docker cache instead.
Unfortunately, this doesn't work: We need to copy the lockfile for dependency installation. Even if the feature PR that led to the build didn't update any dependencies for this package, most of the time some other PR will have updated one or more dependencies of any other workspace in the meantime. This means we have a different lockfile than the last time the build for this specific workspace was done, so the dependencies installation is performed again, even though it could and should be skipped.
+1 on this. I have a mono repo layout with Frontend (Vue SPA), Backend (AWS Lambdas in TypeScript) and Shared (type definitions and utilities shared between Frontend and Backend).
For consistency Lambdas are built in docker containers provided by AWS that match the Lambda runtime.
This means exposing only the Backend and Shared projects inside the containers for transpilation.
But as all projects generally share the same dependency versions (a goal of using Workspaces), the only Package-Lock is at the repo root level, and not available in the subprojects.
I can run 'npm i' in the subprojects (once exposed inside Docker), but I want to use pacakage locks and npm ci
to guarantee reproducibility of package versions.
Manually copying the top-level package-lock.json
into the sub projects breaks if (as they do sometimes) the sub-projects depend on a specific pinned version of a package that differs from the top-level package-lock.
it also would lead to installing unneeded packages in the lambas.
Given that yarn
seems to support the needed behavior, maybe the solution is the switch??
Same problem here, for firebase functions (which live in a subfolder / package) one also needs the lock file for that only to be there.
Same problem here with serverless lambda functions - any updates or workarounds?
My current workaround package.json
{
"workspaces": [
"api",
"client"
],
"scripts": {
"api:lock": "cd api && npm i --package-lock-only --workspaces=false",
"client:lock": "cd client && npm i --package-lock-only --workspaces=false",
"genlock": "concurrently \"npm run client:lock\" \"npm run api:lock\"",
"prerelease": "npm run genlock && git add api/package-lock.json client/package-lock.json",
"release": "standard-version -t '' -a"
}
}
So if you run npm run release
for each workspace, a dedicated package-lock.json
is created.
My workspaces do not have dependencies to other ones and are standalone. I'm not sure if this will work if one workspace depends on another workspace.
npm run release
.
I currently use @npmcli/arborist
to achieve this, this is v straightforward:
import fs from "node:fs"
import path from "node:path"
import util from "node:util"
import Arborist from "@npmcli/arborist"
async function predeploy(packagePath) {
const arb = new Arborist({
path: packagePath,
})
const { meta } = await arb.buildIdealTree()
meta?.commit()
const packageLockFile = path.join(packagePath, "package-lock.json")
await fs.promises.writeFile(packageLockFile, String(meta))
}
I currently use
@npmcli/arborist
to achieve this, this is v straightforward:import fs from "node:fs" import path from "node:path" import util from "node:util" import Arborist from "@npmcli/arborist" async function predeploy(packagePath) { const arb = new Arborist({ path: packagePath, }) const { meta } = await arb.buildIdealTree() meta?.commit() const packageLockFile = path.join(packagePath, "package-lock.json") await fs.promises.writeFile(packageLockFile, String(meta)) }
Copying the lock file did not work for us because of conflicts, but this works well, so thanks for posting it. Would be good to have a supported solution for this. I tried yarn and pnpm but generating the lock file is the lowest touch compared with customising Dockerfiles and running extra commands from the root in build containers etc.
I have created isolate-package as a way to isolate a package from a monorepo. During the process, it generates an isolated lockfile depending on the package manager. For NPM it uses Arborist. PNPM and Yarn v1 lockfiles are also supported. I will soon make a fallback for modern Yarn versions, by generating an NPM lockfile.
To make it work I had to move the node_modules folder to and from the isolate output, but luckily running Arborist is fast enough that it feels instant, so I don't expect it to cause issues with IDEs and such. But it would be great if it would be possible to pass Arborist the location of the node_modules folder from the root of the monorepo, so the move workaround is no longer necessary.
I created it for deployments to Firebase but the solution is generic.
In case you are interested in the Firebase part, it is showcased in this monorepo boilerplate. The main branch uses PNPM but there's also an NPM branch called use-npm
.
Folks following this might also be interested in Turborepo's implementation
https://turbo.build/repo/docs/reference/command-line-reference/prune https://turbo.build/repo/docs/handbook/deploying-with-docker
Turborepo's implementation seems very promising, and very fast. Last time I checked it appeared fixed to outputting to "out" in the root of the monorepo, which wasn't so practical for me since I isolate multiple packages for deployment within the same monorepo.
Their lockfile pruning implementation is likely more sophisticated. Since I am not happy with my NPM workaround (moving node_modules), I was look at "borrowing" some parts of the code, but haven't been able to figure out how it works yet, and I'm not a Rust programmer :)
Turborepo's implementation seems very promising, and very fast. Last time I checked it appeared fixed to outputting to "out" in the root of the monorepo, which wasn't so practical for me since I isolate multiple packages for deployment within the same monorepo.
Yeah, I would also like to just get package-lock.json
generated into each individual workspace, that's why I got to looking around further and ended up here with y'all likeminded folk.