berry Individual lockfile per workspace

[x] I'd be willing to implement this feature
[ ] This feature can already be implemented through a plugin

Describe the user story

I believe the best user story are the comments in this issue: https://github.com/yarnpkg/yarn/issues/5428

Describe the solution you'd like

Add support for generating a lockfile for each workspace. It could be enabled via configuration option like lockfilePerWorkspace?

Describe the drawbacks of your solution

As this would be opt-in and follow well known semantics, I'm not sure there are any drawbacks to implementing the feature. One could argue if it should be in the core or implemented as a plugin. I personally don't have a preference of one over the other.

Describe alternatives you've considered

This could be implemented in the existing workspace-tools plugin. I'm unsure if the hooks provided by Yarn would allow for this, though.

Additional context

https://github.com/yarnpkg/yarn/issues/5428

Apr 20 '20 18:04 migueloller

I believe the best user story are the comments in this issue: yarnpkg/yarn#5428

It's best to summarize it - there are a lot of discussions there 🙂

I've seen your comment about cache layers, but I wonder if what you're looking for isn't just a way to compute a "cache key" for a given workspace?

Apr 21 '20 12:04 arcanis

I'll gladly do that 😄

Current state of affairs

Yarn workspaces has a lot of benefits for monorepos, one of them being the ability to hoist third-party dependencies to reduce installation times and disk space consumed. This works by picking a dependency version that fits the most dependency requirements as specified by package manifest files. If a single dependency can't be found that matches all requirements, that's OK, the dependency is kept in a package's node_modules folder instead of the top level and the Node.js resolution algorithm takes care of the rest. With PnP, I'm assuming the Node.js resolution algorithm is patched in a similar way to make it work with multiple versions of dependencies.

I'm assuming that because all of these dependencies are managed by a single yarn install the Yarn team opted to have a single lock file at the top-level where the workspace root is defined.

What's the problem?

In various monorepos, it is desirable to treat a workspace as an independent deployable entity. Most deployment solutions out there will look for manifest and lock files to set up required dependencies. In addition to this some tools, like Docker, can leverage the fact that versions are immutable to implement caching and reduce build and deployment times.

Here's the problem: because there is a single lock file at the top-level, one can't just take a package (i.e., workspace) and deploy it as one would when not using Yarn workspaces. If there was a lock file at the package level then this would not be an issue.

I've seen your comment about cache layers, but I wonder if what you're looking for isn't just a way to compute a "cache key" for a given workspace?

It's not just computing a "cache key" for caching, but also having a lock file to pin versions. For example, if you're deploying a workspace as a Google Cloud Function you would want the lock file to be there so that installation of dependencies was pinned as the lock file specifies. One could copy the entire lock file to pin versions but then the caching mechanism breaks. So the underlying thing we're working with here is that deployment platforms use lock files as a cache key for the third-party dependencies.

Apr 21 '20 14:04 migueloller

Let's see if I understand this properly (consider that I don't have a lot of experience with Docker - I've played with docker-compose before, but there are many subtleties I'm still missing):

You create a Docker image by essentially passing it a folder path (your project path)
Docker will compare the content of the folder with what's on the latest image, and if it didn't change it won't create a new layer
So when deploying a monorepo, the layer cache for each workspace gets busted at each modification in each workspace
To solve that, you'd like to deploy a single folder from the monorepo, independent from its neighbours. This way, Docker wouldn't see its content change unless the files inside it actually change.
But since there is no lockfile in this folder, you cannot run an install there.

Did I understand correctly? If so, a few questions:

Workspaces typically cross-reference each others. For example, frontend and backend will both have a dependency on common. How does this fit in your vision? Even if you only deploy frontend, you'll still need common to be available as well.
Do you run yarn install before building the image, or from within the image? I'd have thought that the image was compiled with the install artifacts (by which point you don't really need to have the lockfile at all?), but maybe that's an incorrect assumption.
Specifically, what prevents you from copying the global lockfile into your workspace, then run a yarn install to prune the unused entries? You'd end up with a deterministic lockfile that would only change when the workspace dependencies actually change.

Apr 21 '20 14:04 arcanis

Did I understand correctly?

I think so, but let me add a bit more context to how the layer caching mechanism works in Docker.

When building a docker image (i.e., docker build) a path to a folder is provided to Docker. This folder is called the build context. While the image builds, Docker can only access files from the build context.

In the Dockerfile, the specification to build the Docker image, there are various commands available. The most common one is COPY, it will copy files from the build context to the image's filesystem, excluding patterns from .dockerignore. This is where caching comes in. Every time a Docker command is ran, an image layer is created. These layers are identified by a hash of the filesystem's content. What this means is that usually, you'll have people have a Dockerfile somewhat like this:

# layer 1
FROM node
# layer 2
COPY package.json yarn.lock .
# layer 3
RUN yarn install
# layer 4
COPY . .

Again, this might be an oversimplification but it gets the point across. Because we first copy the package.json and the lockfile and then run yarn install. If next time we build this image, package.json and yarn.lockfile didn't change, only layer 4 has to be rebuilt. If nothing changed, then nothing gets rebuilt, of course. One could make the build context the entire monorepo but the lock file will still be changing a lot even if the dependencies of the package we're trying to build have not changed.

Workspaces typically cross-reference each others. For example, frontend and backend will both have a dependency on common. How does this fit in your vision? Even if you only deploy frontend, you'll still need common to be available as well.

This is a good question. After much thought, our solution is going to be a private NPM registry. This will not only work for building Docker images but also for using tools like GCP Cloud Functions or AWS Lambda. If Docker was the only tool we were using we could use the entire monorepo as the build context but still just COPY the dependencies and Docker layer caching will still work. This time, instead of the cache key being a single lock file, it would be the lock file of the package and all it's transitive dependencies that live in the monorepo. That's still not the entire repo's lock file. But since Docker isn't the only deployment platform we use that expects a yarn.lock to be there, this solution doesn't work for us.

Do you run yarn install before building the image, or from within the image? I'd have thought that the image was compiled with the install artifacts (by which point you don't really need to have the lockfile at all?), but maybe that's an incorrect assumption.

It's a best practice to do it within the image. This guarantees native dependencies are built in the appropriate OS and has the benefit of caching to reduce build times in CI. In our current workaround we actually have to build everything outside, move it into Docker, and run npm rebuild. It's very very hacky though and we're now at a point where the lack of caching is slowing us down a lot.

Specifically, what prevents you from copying the global lockfile into your workspace, then run a yarn install to prune the unused entries? You'd end up with a deterministic lockfile that would only change when the workspace dependencies actually change.

This might be a good workaround for now, perhaps in a postinstall script. Would this keep the hoisting benefits of Yarn workspaces?

Apr 21 '20 15:04 migueloller

Specifically, what prevents you from copying the global lockfile into your workspace, then run a yarn install to prune the unused entries? You'd end up with a deterministic lockfile that would only change when the workspace dependencies actually change.

I tried this out with and unfortunately it's not as straightforward since running yarn install anywhere within the repository uses the workspaces feature.

Apr 22 '20 14:04 migueloller

Linking a comment to a related issue here: https://github.com/yarnpkg/yarn/issues/4521#issuecomment-478255917

Apr 22 '20 17:04 migueloller

If having an independent deployable entity is the main reason for this. I currently have a plugin that is able to do this. I need to work with my employer to get it released however. It's quite simple in how it does what it does and likely could be made better. It copies the lock and workspace into a new location edits out devdep from the workspace and runs a normal install. It keeps everything pinned where it was. It reuses the yarn cache, keeps the yarnrc, plugins, and yarn version installed in the output.

May 19 '20 17:05 Larry1123

If having an independent deployable entity is the main reason for this. I currently have a plugin that is able to do this. I need to work with my employer to get it released however. It's quite simple in how it does what it does and likely could be made better. It copies the lock and workspace into a new location edits out devdep from the workspace and runs a normal install. It keeps everything pinned where it was. It reuses the yarn cache, keeps the yarnrc, plugins, and yarn version installed in the output.

@Larry1123 I think your plugin could be very useful to quite a few folks, will your employer allow you to share it?

May 29 '20 15:05 tabroughton

I got the ok be able to release it, will have to do it when I have the time.

May 29 '20 21:05 Larry1123

@Larry1123 Wondering how are you handling yarn workspace. Does your plugin creates yarn.lock for each package in the workspace?

Jun 05 '20 00:06 samarpanda

In a way yes, it takes the project's lock and reruns the install of the workspace as if it was the only one in the project in a new folder after also removing devDependencies. That way the resulting lock matches the project but for only what is needed for that workspace. It also currently hardlinks the cache, and copies what it can keep from the project's .yarn files.

Jun 09 '20 15:06 Larry1123

The backend + frontend + common scenario is a good one, we have something similar and it took me a while to realize that we sort of want two sets of workspaces. Let's say the repo looked like this:

.
├── common/
│   └── package.json
│
├── frontend/
│   └── package.json
│
├── backend/
│   └── package.json
│
├── package.json
└── yarn.lock

We're building two Docker images from it:

frontend-app, where the Docker build context contains:
- common/
- frontend/
- yarn.lock
backend-app, where the Docker build context contains:
- common/
- backend/
- yarn.lock

This can be done, and is nicely described in https://github.com/yarnpkg/yarn/issues/5428#issuecomment-403722271 (we furthermore utilize tarball context as a performance optimization), but the issue with a single lockfile stays: a change in frontend dependencies also affects the backend build.

(We also have other tooling that is affected by this, for example, we compute the versions of frontend-app and backend-app from Git revisions of the relevant paths, and a change to yarn.lock currently affect both apps.)

I don't know what the best solution would be, but one idea I had was that workspaces should actually be a two-dimensional construct in package.json, like this:

{
  "workspaces": {
    "frontend-app": ["frontend", "common"],
    "backend-app": ["backend", "common"]
  }
}

For the purposes of module resolution and installation, Yarn would still see this as three "flat" workspaces, frontend, backend and common, and the resulting node_modules structure (I don't know how PnP does this) would be identical to today, but Yarn would understand how these sets of workspaces are intended to be used together and it would maintain two additional files, yarn.frontend-app.lock and yarn.backend-app.lock (I'm not sure if the central yarn.lock would be necessary or not but that's a relative detail for this argument's sake).

When we'd be building a Docker image for frontend-app (or calculating a version number), we'd involve these files:

common/
frontend/
yarn.frontend-app.lock

It would be awesome if this could work but I'm not sure if it's feasible...

As a side note, I previously thought that I wanted to have yarn.lock files in our workspaces, i.e., backend/yarn.lock and frontend/yarn.lock, but I now mostly agree with this comment:

I think the idea of Yarn 1.x monorepo is a little bit different. It isn't about independent projects under one roof, it is more about a singular big project having some of its components exposed (called workspaces).

In our case, frontend and backend workspaces are not standalone – they require common to work. The Yarn Workspaces is a great mechanism to link them together, to de-duplicated dependencies etc., we "just" need to have multiple sets of workspaces at Docker build time.

Jun 19 '20 11:06 borekb

I've changed where I stand on this issue and shared my thoughts here: https://github.com/yarnpkg/yarn/issues/5428#issuecomment-650329224.

Jun 26 '20 18:06 migueloller

@arcanis I'm reading your Yarn 2.1 blog post and there's a section on Focused Workspaces there. I don't have experience with this from either 2.x or 1.x Yarn but is it possibly solving the backend + frontend + common scenario & Docker builds?

Like, could I create a build context that contains the main yarn.lock file and then just packages/frontend + packages/common (omitting packages/backend), then focus the workspace on frontend and run the Docker build from there?

Or is it still not enough and something like named sets of workspaces would be necessary?

Jul 12 '20 07:07 borekb

I think it would, yes. The idea would be to run yarn workspaces focus inside frontend, which will install frontend+common, then to mount your whole repo inside the Docker image.

I encourage you to try it out and see whether there are blockers we can solve by improving this workflow. I'm not sold about this named workspace set idea, because I would prefer Yarn to deduce which workspaces are needed based on the main ones you want. It's too easy to make a mistake otherwise.

Jul 12 '20 08:07 arcanis

I'm not sold about this named workspace set idea, because I would prefer Yarn to deduce which workspaces are needed based on the main ones you want. It's too easy to make a mistake otherwise.

Agree; if the focus mode works, then it's probably better.

Do you have a suggestion on how to construct the common/frontned/backend dependencies to make it the most tricky for Yarn? Like, request [email protected] from common, @2.x from frontend and @3.x from backend? The harder the scenario, the better 😄.

Jul 12 '20 08:07 borekb

I don't know if this makes a difference in your reasoning @arcanis but I thought it would be worth mentioning in case there's something about Yarn's design that would lend itself for this... this issue could also be solved by having a lockfile per worktree instead of per workspace. For example, each deployable workspace can itself be a worktree and specify which workspaces from the project it depends on.

Here's an example repo: https://github.com/migueloller/yarn-workspaces

It would be fine to have a lockfile for app1 and app2.

That being said, based on what I had commented before (https://github.com/yarnpkg/berry/issues/1223#issuecomment-650329692), one could just have multiple yarn projects in the same repo and have them all share the same Yarn cache. While it wouldn't be as nice as running yarn at the top of the repo if it were a single project, it will help with disk size if Yarn PnP is being used.

I'm taking the definition of project > worktree > workspace from here.

Jul 20 '20 21:07 migueloller

Another thought is that yarn workspaces focus app1 could be called with an option so that it modified the top-level lockfile, perhaps this could be used to generate the "trimmed-down" lockfile for the Docker image.

I also wanted to add another use case in addition to Docker images. If one has a large monorepo where CI jobs are started depending on whether a certain "package" changed, having a shared lockfile makes that a bit hard for the same reasons it's hard on Docker's cache. If we want to check if a workspace changed, including its dependencies, we would also want to check the lockfile. For example, some security update could've been added that changed the patch version being used but not the version range in package.json. If the top-level lockfile is used, then the CI job would run for every change on any package. Having a single lockfile per workspace would alleviate this issue by simply using that lockfile instead of the top-level one.

Jul 20 '20 21:07 migueloller

If one has a large monorepo where CI jobs are started depending on whether a certain "package" changed, having a shared lockfile makes that a bit hard for the same reasons it's hard on Docker's cache.

That is a good point, and we have similar use case. Not just for CI but we also e.g. calculate the app versions ("apps" are e.g. frontend and backend) from their respective paths and "youngest" Git commits; a single shared yarn.lock makes this problematic.

Jul 21 '20 06:07 borekb

yarn workspaces focus is a great command/plugin 👍

I'm currently using it within our Dockerfile - one question about determinism (which may expose my misunderstanding of yarn's install process:

Is it possible to run focus such that it should fail if the yarn.lock would be modified (i.e. --frozen-lockfile or --immutable but allow .pnp.js to be modified)?

Aug 07 '20 14:08 gntract

No (because the lockfile would effectively be pruned from extraneous entries, should it be persisted, so it wouldn't pass the immutable check) - I'd recommend to run the full yarn install --immutable as a CI validation step.

Aug 07 '20 14:08 arcanis

@tabroughton @samarpanda I have gotten the plugin I was working on public https://gitlab.com/Larry1123/yarn-contrib/-/tree/master/packages/plugin-production-install. I hope it works for your needs.

Oct 07 '20 17:10 Larry1123

I have a slightly different use case in mind for this feature. Originally wrote it on Discord, but copying it here for posterity:

One of the downsides of monorepos seems to be that once you add new developers, you have to give them access to the whole code base, while with single repos you could partition things in a way so they have access to smaller bits and pieces.

Now, this could probably be solved with git submodules, putting each workspace in its own git repo. Only certain trusted/senior devs could then have access to the root monorepo, and work with it as one.

The only problem holding this back seems to be the lack of a dedicated yarn.lock per workspace.

With a yarn.lock per workspace it seems that the following workflow would be possible:

Add new dev to team, give them access to only a limited set of workspaces (separate git repos)
They can run yarn install, and it would install any workspace dependencies from a private package repository (verdaccio, or github packages, private npm, etc)
They can just start developing on their own little part of the project, and commit changes to it in isolation. The top level monorepo root yarn.lock is not impacted.
CI can still be set up to test everything before merging

Seems like there would also be a need to isolate workspace dependencies to separate .yarn/cache in workspace subdirs if this approach was supported.

I'm not concerned about pushing, more concerned about pulling. I don't want any junior dev to simply pull all the company intellectual property as one simple command.

How do you guys partition projects with newer, junior (not yet established/trusted) devs, now that everyone works from home?

Oct 20 '20 07:10 andreialecu

This is something that has been a pain point for my work also. I have been wanting to work out a solution to this just have not had the time to truly work it out. Once I understand yarn better I had intended to try to work out a plan of action. A holistic approach I feel would have various integrations into things like identity providers, git, git(hub|lab)/bitbucket, yarn, and tooling or zero-trust coronation of internal dependencies and resolutions throughout the super repo. The integration into the git host would be to be able to handle cross project things but not sure what level it would need. I feel that a tool like this is sorely needed however hard to get right and time consuming to produce. Also I feel that a larger scope could be covered by creating something ment for cross organization cooperation as it would have open source use also then. It would likely take RFC style of drafting and planning to build. As current tool just doesn't support such workflows well. With how things go now my work tends to lean to don't trust new/Junior devs with wide access and if they work on a project it has to be in it's own scoped repos and projects.

Oct 20 '20 08:10 Larry1123

I have created a pretty simple Yarn 2 plugin that will create a separate yarn.lock-workspace for each workspace in a monorepo:

https://github.com/andreialecu/yarn-plugin-workspace-lockfile

I haven't yet fully tested it, but it seems to create working lockfiles.

I would still recommend @Larry1123's plugin above for production deployment scenarios: https://github.com/yarnpkg/berry/issues/1223#issuecomment-705094984, but perhaps someone will find this useful as well.

Oct 20 '20 15:10 andreialecu

I'll mirror my comment from https://github.com/yarnpkg/yarn/issues/5428#issuecomment-712481010 here:

My need for this behavior (versioning per workspace, but still have lockfiles in each package) is that I have a nested monorepo, where a subtree is exported to another repo entirely, so must remain independent. Right now I'm stuck with lerna/npm and some custom logic to attempt to even out versions. It would be nice if yarn could manage all of them at once, but leave the correct subset of the "entire workspace pinning" in each. (Though, I'm really not sure how this nested workspace will play out if I were to switch to berry, when berry needs to be committed to the repo, so needs to be committed twice?)

@andreialecu That plugin looks interesting; it's almost what I'm looking for, though appears to be directed towards deployment (and not just general development). But it does give me hope that what I'm looking for might be prototype-able in a plugin.

Oct 20 '20 18:10 jakebailey

@jakebailey do note that there are two plugins:

for deployment: https://gitlab.com/Larry1123/yarn-contrib/-/tree/master/packages/plugin-production-install for development: https://github.com/andreialecu/yarn-plugin-workspace-lockfile

Feel free to take either of them and fork them. If you end up testing mine and improving it, feel free to contribute changes back as well.

Oct 20 '20 18:10 andreialecu

@andreialecu your plugin looks pretty good and kind of what we're personally looking for, any chance you'd be willing to open a PR to the official workspace-tools package? It'd be nice if such a feature would be supported officially.

Feb 17 '21 22:02 DanielOrtel

I haven't had time to try it out in my project (the above reminded me it exists, oops), but one thing about that plugin is that it appears to always write out to some special file, which then requires the dev to rename it if you are exporting the whole repo; it'd be nice if the plugin (and future support) didn't require that and just maintained regular yarn.lock files.

Feb 17 '21 22:02 jakebailey

I would also need this functionality with a public git submodule for a workspace, if someone is working on the public part they don't have a lockfile. I found this discussion which explains my situation well. I see that as an option like nmHoistingLimits, not sure if it's relevant.

I would also like my .yarnrc.yml to be in the public part and the root .yarnrc.yml to use it.

Apr 09 '21 12:04 bertho-zero