rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

[RRFC] Create dedicated lockfile for a specific workspace

Open codepunkt opened this issue 2 years ago • 15 comments

Motivation ("The Why")

I'm working on a large monorepo with a few dozen packages, using npm workspaces. A lot of these packages are frontend applications and backend services, each of which are deployed as docker containers. To do so, the build process builds a docker image for each of them and has to install the required dependencies in the process. To do so, we copy the root-level package-lock.json and the individual package.json of that package to the filesystem of the container and perform a npm installation afterwards.

The interesting part of all of the Dockerfile looks like this (simplified):

COPY package-lock.json .
COPY service/package.json .
RUN npm ci --audit false --fund false

COPY service/ .
RUN npm run build

This works fine and will only install the dependencies required for this exact package 👍🏻

However, I have more than three dozen workspaces in the monorepo and update all dependencies in all workspaces automatically (think renovate, dependabot etc). This means the root package-lock.json is updated multiple times a day - which leads to a problem.

Due to the COPY package-lock.json being one of the first instructions in all Dockerfile, this file changing multiple times a day prevents us from efficiently using dockers build cache: Every time any workspace updates a dependency, all cached docker build layers after this instruction are invalidated and have to be re-executed from scratch for every workspace package.

Both yarn (userland) and pnpm have tools to generate a specific lockfile for a single workspace out of the hoisted root lockfile.

It should be possible to also do this with npm - but I couldn't find any readymade solutions for it.

What do you think?

Example

It would be great if there was a CLI Command that would generate a package-lock.json for a specific workspace out of the hoisted root package-lock.json. In the absence of a better name, let's call it "make-dedicated-lockfile". Running npm make-dedicated-lockfile in workspace "service" or running npm make-dedicated-lockfile -w service in the project root would generate a dedicated package-lock.json in the "service" workspace to reflect the reproducible dependency tree of that specific workspace.

How

Current Behaviour

I need to use package-lock.json and run npm ci to ensure that installations remain identical and reproducible throughout the entire dependency tree across all the systems and pipelines, one of them being building a docker image for each workspace.

Right now, this is done with a npm script in each workspace:

docker build .. -f Dockerfile

Desired Behaviour

Before building a docker image for each workspace, generate a dedicated lockfile specific to that workspace. Build the docker image, then delete the specific lockfile.

This could be done with a npm script like this:

npm make-dedicated-lockfile && docker build . -f Dockerfile && rm package-lock.json

The Dockerfile would then copy over the dedicated lockfile instead of the root level one (please take note of the context switch in the docker npm script. Context was previously set to .. (repository/workspace root), and can now be set to . (workspace folder).

COPY package-lock.json .
COPY package.json .
RUN npm ci --audit false --fund false

COPY / .
RUN npm run build

Additional thoughts

It is possible that you might not want multiple package-lock.json files in a workspaces project, ever - because it might confuse people. If that's the case, one could think of a few ways to solve this - for example generating a dedicated-lock.json file instead and adding a command line flag to npm ci that would need to specifically point to this dedicated lockfile:

npm make-dedicated-lockfile && docker build . -f Dockerfile && rm dedicated-lock.json
COPY dedicated-lock.json .
COPY package.json .
RUN npm ci --audit false --fund false --use-workspace-lockfile

COPY / .
RUN npm run build

codepunkt avatar Mar 15 '22 09:03 codepunkt

I am going to +1 this with my own use-case. I searched for this hoping the issue would already be addressed, but alas.

I think it's relevant to talk about github actions here since github owns npm, and one would assume that one of their goals is tight integration between the two services & toolchains, so I'm going to share my use-case here which I think is entirely reasonable.

You have a frontend, backend, and process-tooling directory in your repo's root. These are walled-off from one another on a dependency basis, and each has a package.json as the manifest for its particular dependencies. You have some overlap, but frontend has ~30 dependencies, backend has ~50 dependencies, and process-tooling has ~5 dependencies. Ideally, for the parsimony of both download size and complexity of the sub-project, and speed of execution, those are separate dependency lists and those dependency lists generate separate lockfiles which you can use for well-scoped reproducible installs which hit github's cache.

Doing this, you'll save both github & npm bandwidth.

This is possible with manual management of package.json-containing directories, but not with npm's current implementation of workspaces:

# ... preamble stuff

steps:
  - name: Checkout Branch w/ repo history
    uses: actions/[email protected]
    with:
      fetch-depth: 0
     path: ~/.npm

  - name: Cache node modules
    uses: actions/cache@v2
    with:
      path: ~/.npm
      key: ${{ runner.os }}-node-modules-cache-process-tooling-${{ hashFiles('process-tooling/package-lock.json') }}
      restore-keys: |
        ${{ runner.os }}-node-modules-cache-process-tooling-

   - name: Install (hopefully efficiently)
     run: |
     npm --prefix process-tooling/ install
     # cache will almost certainly be hit since these dependencies rarely change
     npm --prefix process-tooling/ run cleanup-pr-comment
     

# ... execute your tooling & subsequent handling

dougpagani avatar Mar 18 '22 12:03 dougpagani

This is possible with manual management of package.json-containing directories, but not with npm's current implementation of workspaces:

I'm not sure I understand. Why would you like to implement this using workspaces?

codepunkt avatar Mar 20 '22 17:03 codepunkt

@codepunkt why do you have to build your workspace (docker) if nothing changed inside your specific workspace?

raphaelboukara avatar Jul 08 '22 16:07 raphaelboukara

Whenever a feature PR (pull request) updates a specific workspace, that workspace is being rebuild in CI/CD, including docker images. Building the docker image will install dependencies, which will use the lockfile from repository root.

With workspaces enabled, we only have that single lockfile at repository root, which gets continuously updated by package version update PRs created by tools like renovate, greenkeeper or dependabot.

When a feature PR updating a specific workspace didn't change any package dependencies, the dependency installation step when building the docker image could be skipped and read from docker cache instead.

Unfortunately, this doesn't work: We need to copy the lockfile for dependency installation. Even if the feature PR that led to the build didn't update any dependencies for this package, most of the time some other PR will have updated one or more dependencies of any other workspace in the meantime. This means we have a different lockfile than the last time the build for this specific workspace was done, so the dependencies installation is performed again, even though it could and should be skipped.

codepunkt avatar Jul 08 '22 17:07 codepunkt

+1 on this. I have a mono repo layout with Frontend (Vue SPA), Backend (AWS Lambdas in TypeScript) and Shared (type definitions and utilities shared between Frontend and Backend).

For consistency Lambdas are built in docker containers provided by AWS that match the Lambda runtime.

This means exposing only the Backend and Shared projects inside the containers for transpilation.

But as all projects generally share the same dependency versions (a goal of using Workspaces), the only Package-Lock is at the repo root level, and not available in the subprojects.

I can run 'npm i' in the subprojects (once exposed inside Docker), but I want to use pacakage locks and npm ci to guarantee reproducibility of package versions.

Manually copying the top-level package-lock.json into the sub projects breaks if (as they do sometimes) the sub-projects depend on a specific pinned version of a package that differs from the top-level package-lock.

it also would lead to installing unneeded packages in the lambas.

Given that yarn seems to support the needed behavior, maybe the solution is the switch??

jeff-wishnie avatar Oct 03 '22 16:10 jeff-wishnie

Same problem here, for firebase functions (which live in a subfolder / package) one also needs the lock file for that only to be there.

michi88 avatar Mar 31 '23 11:03 michi88

Same problem here with serverless lambda functions - any updates or workarounds?

KarnerTh avatar Oct 04 '23 09:10 KarnerTh

My current workaround package.json

{
  "workspaces": [
    "api",
    "client"
  ],
 "scripts": {
    "api:lock": "cd api && npm i --package-lock-only --workspaces=false",
    "client:lock": "cd client && npm i --package-lock-only --workspaces=false",
    "genlock": "concurrently \"npm run client:lock\" \"npm run api:lock\"",
    "prerelease": "npm run genlock && git add api/package-lock.json client/package-lock.json",
    "release": "standard-version -t '' -a"
  }
}

So if you run npm run release for each workspace, a dedicated package-lock.json is created. My workspaces do not have dependencies to other ones and are standalone. I'm not sure if this will work if one workspace depends on another workspace.

dantio avatar Oct 04 '23 11:10 dantio

npm run release.

ljharb avatar Oct 04 '23 11:10 ljharb

I currently use @npmcli/arborist to achieve this, this is v straightforward:

import fs from "node:fs"
import path from "node:path"
import util from "node:util"

import Arborist from "@npmcli/arborist"

async function predeploy(packagePath) {
  const arb = new Arborist({
    path: packagePath,
  })

  const { meta } = await arb.buildIdealTree()
  meta?.commit()
  const packageLockFile = path.join(packagePath, "package-lock.json")
  await fs.promises.writeFile(packageLockFile, String(meta))
}

mpsq avatar Oct 04 '23 14:10 mpsq

I currently use @npmcli/arborist to achieve this, this is v straightforward:

import fs from "node:fs"
import path from "node:path"
import util from "node:util"

import Arborist from "@npmcli/arborist"

async function predeploy(packagePath) {
  const arb = new Arborist({
    path: packagePath,
  })

  const { meta } = await arb.buildIdealTree()
  meta?.commit()
  const packageLockFile = path.join(packagePath, "package-lock.json")
  await fs.promises.writeFile(packageLockFile, String(meta))
}

Copying the lock file did not work for us because of conflicts, but this works well, so thanks for posting it. Would be good to have a supported solution for this. I tried yarn and pnpm but generating the lock file is the lowest touch compared with customising Dockerfiles and running extra commands from the root in build containers etc.

gunzy83 avatar Oct 25 '23 23:10 gunzy83

I have created isolate-package as a way to isolate a package from a monorepo. During the process, it generates an isolated lockfile depending on the package manager. For NPM it uses Arborist. PNPM and Yarn v1 lockfiles are also supported. I will soon make a fallback for modern Yarn versions, by generating an NPM lockfile.

To make it work I had to move the node_modules folder to and from the isolate output, but luckily running Arborist is fast enough that it feels instant, so I don't expect it to cause issues with IDEs and such. But it would be great if it would be possible to pass Arborist the location of the node_modules folder from the root of the monorepo, so the move workaround is no longer necessary.

I created it for deployments to Firebase but the solution is generic.

In case you are interested in the Firebase part, it is showcased in this monorepo boilerplate. The main branch uses PNPM but there's also an NPM branch called use-npm.

0x80 avatar Dec 17 '23 21:12 0x80

Folks following this might also be interested in Turborepo's implementation

https://turbo.build/repo/docs/reference/command-line-reference/prune https://turbo.build/repo/docs/handbook/deploying-with-docker

lkraav avatar Mar 03 '24 11:03 lkraav

Turborepo's implementation seems very promising, and very fast. Last time I checked it appeared fixed to outputting to "out" in the root of the monorepo, which wasn't so practical for me since I isolate multiple packages for deployment within the same monorepo.

Their lockfile pruning implementation is likely more sophisticated. Since I am not happy with my NPM workaround (moving node_modules), I was look at "borrowing" some parts of the code, but haven't been able to figure out how it works yet, and I'm not a Rust programmer :)

0x80 avatar Mar 04 '24 15:03 0x80

Turborepo's implementation seems very promising, and very fast. Last time I checked it appeared fixed to outputting to "out" in the root of the monorepo, which wasn't so practical for me since I isolate multiple packages for deployment within the same monorepo.

Yeah, I would also like to just get package-lock.json generated into each individual workspace, that's why I got to looking around further and ended up here with y'all likeminded folk.

lkraav avatar Mar 04 '24 15:03 lkraav