rules_nodejs
rules_nodejs copied to clipboard
Support yarn/npm7 workspaces
Would yarn workspaces work with this rule? I was playing around with 0.10.0 recently but didn't get very far in trying to setup yarn_install. Would a future version support this?
@clydin from our team has been looking at yarn workspaces.
As I understand it, we have the same features as yarn workspaces without actually using it. You can have multiple directories with a package.json file, run a command to install all their dependencies, and publish individual directories to npm. But to have one package depend on another, you don't need package.json for this, as the deps of the consuming target can point directly to the output of the providing target.
@alexeagle I think you'll still run into problem supporting both yarn workspaces (e.g. for interactive development with livereload) and Bazel. To support a local yarn install, you need the subpackages in your package.json, but this will break the yarn_install in WORKSPACE since those private subpackages won't be published to npm.
Do you have advice on how to support both a native yarn workflow and Bazel workflow from a monorepo with multiple packages?
@tristanz I think the idea here is that you don't isntall your sub packages with a yarn install you just reference their outputs directly
If bazel supports yarn workspaces, could that be used to solve the following issue:
Assume a monorepo has two 'top-level' projects that are separately deploy able, e.g. 'app1' and 'app2' and they both depend upon a library 'lib1'. Lib1 is typescript library with a package.json that has a dependency on a third party library, e.g. lodash. Each of the applications also have package.json files with lodash but they also add another dependency, e.g. angular. The two apps must have a consistent version for lodash but they may want to have different versions for angular at times, e.g. so they can upgrade independently. Bazel supports supports multiple yarn_install/npm_install calls to create the workspaces but ts_library from app1 would prevent a dependency on lib1 because it will throw the 'All npm dependencies need to come from a single workspace" error and I don't think a single npm_install/yarn_install' workspace can contain two versions of angular so we can't use one package.json?
Is there some other solution to this or would we need the yarn workspace capability? Thanks.
There's probably a lot of people using Lerna / Yarn workspaces currently that would be interested in Bazel but need to keep workspaces working while using Bazel for some things.
I think you'll still run into problem supporting both yarn workspaces (e.g. for interactive development with livereload) and Bazel. To support a local yarn install, you need the subpackages in your package.json, but this will break the yarn_install in WORKSPACE since those private subpackages won't be published to npm.
Assuming Bazel is delegating to Yarn how it says it is Yarn handles doing a symlink for packages within the monorepo instead of trying to download them.
It looks like currently Bazel doesn't blow up when using Yarn workspaces, but the generated @npm repository will only have the top-level node_modules packages included. If there's a nested node_modules folder in a yarn workspace package, it will be ignored. For example, given the directory structure below, if app and lib both depend on a different version of some external library, Yarn will properly have the two versions, one at the top-level node_modules and one inside app or lib. The runfiles set up with the @npm repository, though, will always point to the top-level node_modules, regardless if the runfiles are declared in the packages/app Bazel package or packages/lib Bazel package. If each Bazel package got runfiles set up the same way Yarn does and let Node module resolution take care of the rest, then it should work as expected.
project
├── BUILD
├── WORKSPACE
├── package.json
├── packages
│ ├── app
│ │ ├── BUILD
│ │ ├── index.ts
│ │ └── package.json
│ └── lib
│ ├── BUILD
│ ├── index.ts
│ └── package.json
└── yarn.lock
Could the Node linker be reworked so that instead of having a single node_modules in the runfiles, it has one per package? I guess this is what it would mean to "add Yarn workspaces support". It would be nice if rules_nodejs could do this by itself and use of Yarn workspaces wasn't required. Delegating to the package manager, in this case Yarn, seems more Bazel "native" though, since Bazel stays away from having multiple versions of dependencies.
@alexeagle, if there's a need for contributors I'll gladly help with getting Yarn workspaces support. My team is trying to move from Lerna + Yarn workspaces to Bazel but it's imperative that if two different apps in our repo have two different versions of some transitive dependency, it resolves properly. There's two reasons for this. First, because it would break the application if the wrong version is used of course. And second, because while keeping the same version of a shared library the same across a monorepo is definitely ideal, specially if the library is in the monorepo itself, it's sometimes not possible to make a change across the entire repo.
I more or less understand the source so I might be able to experiment with it if I'm pointed in the right direction.
I looked more into this. It seems the current rules are pretty close.
# WORKSPACE
# ... prior setup snipped
yarn_install(
name = "npm",
package_json = "//:package.json",
yarn_lock = "//:yarn.lock",
)
When using the repository rule above, one can depend on dependencies as follows...
# packages/lib/BUILD.bazel
ts_library(
name = "lib",
srcs = ["index.ts"],
module_name = "lib",
deps = [
"@npm//lodash",
],
)
# packages/app/BUILD.bazel
nodejs_binary(
name = "bin",
deps = [
"//packages/lib",
"@npm//lodash",
],
entry_point = "index.ts",
)
If the Yarn workspace "app" depends on lodash@4 but "lib" dependes on lodash@3, both packages will get v4. That being said, the runfiles at bin.sh.runfiles do contain both versions, buried under the @npm repository. All that's left is either for bin.runfiles/packages/lib/node_modules to be symlinked to bin.runfiles/npm/node_modules/packages/lib/node_modules or for the Node.js linker's require patch to do the proper mapping depending on whether the import is coming from "app" or "lib".
Before coming to this conclusion, I tried some things.
I tried making following change to BUILD.bazel but no luck.
# packages/app/BUILD.bazel
nodejs_binary(
name = "bin",
deps = [
+ "@npm//lib",
"@npm//lodash",
],
entry_point = "index.ts",
)
Also, I played with the generated BUILD file's visibility to expose the @npm//lib:lib_contents target since that one includes the @npm//lib:lib__nested_node_modules target and then doing something like this:
# packages/lib/BUILD.bazel
ts_library(
name = "lib",
srcs = ["index.ts"],
module_name = "lib",
deps = [
- "@npm//lodash",
+ "@npm//lib:lib__contents",
],
)
This will properly add the desired contents to @npm but they still won't be available in bin.sh.runfiles.
One more thing, you can always get to the right lodash version from "lib" by doing something like this:
console.log(require.resolve('../../..')) // resolves to "bin.sh.runfiles"
require('../../../npm/node_modules/lib/node_modules/lodash')
@alexeagle @clydin do you have any news on that topic? I have a pretty similar use case to the one mentioned by @migueloller and considering what he described in his findings it looks like what we currently have is only some step ways from what is needed. It would be really nice to finally have support for this 😃
Sorry, no update. We support multiple nested package.json but they each need to be self-contained right now. We don't even have any design thoughts yet for how we would do yarn workspaces, and it might need changes in Bazel to support properly (like with the managed_directories feature)
On Mon, Jun 15, 2020 at 1:47 PM Tiago Costa [email protected] wrote:
@alexeagle https://github.com/alexeagle do you have any news on that topic? I have a pretty similar use case to the one mentioned by @migueloller https://github.com/migueloller and considering what he described in his findings it looks like what we currently have is only some step ways from what is needed. It would be really nice to finally have support for this 😃
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_nodejs/issues/266#issuecomment-644380660, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALSI2XJVP7YCHSR2LH3Z3RW2CGJANCNFSM4FNOURJQ .
Sorry, no update. We support multiple nested package.json but they each need to be self-contained right now.
@alexeagle do you have a link to an example? I'm looking to solve a similar issue and I'd love to see how smarter people solved it
Sure, this repo is an example. Multiple calls to yarn_install in the WORKSPACE file.
I've spent a very long time attempting to implement a workspace rule for Yarn workspaces. My conclusion is that because with Bazel we want to be explicit about dependencies (i.e., fine-grained deps), and a yarn install with workspaces can result in multiple versions of the same package (i.e., require('lodash') could be a different version depending on the file that calls that), there have to be potentially multiple labels for each dependency. For example, if there are 2 Yarn workspaces foo and bar:
# //foo/BUILD
js_library(
name = "foo",
srcs = ["index.js", "package.json"],
entry_point = "index.js",
deps = ["@npm//foo/lodash"],
)
# //bar/BUILD
js_library(
name = "bar",
srcs = ["index.js", "package.json"],
entry_point = "index.js",
deps = ["@npm//bar/lodash"],
)
It is necessary to namespace lodash because foo and bar might depend on different versions of it. Each label can then reference the correct files, something that would not be possible otherwise (i.e., @npm//lodash isn't specific enough if there's 2 versions of lodash).
~~Now here's problem, though. When using js_binary the Node.js linker that patches require would have to determine which lodash to provide and this is where I get stuck. For example, Go Modules enforces the import specifier to include the major version to solve this diamond dependency issue and Rust's name mangler includes crate versions to do the same. Node.js gives you nothing, just "lodash". And unless the require patch script can somehow detect which file is calling require and then provide the appropriate version, I'm not sure how this issue can be solved. It is 100% solvable, though, because Yarn PnP does it. Perhaps the answer is in the RFC or white paper.~~
~~It might be worth mentioning that this isn't an issue with Node.js because it relies in the filesystem for its module resolution. That wouldn't be idiomatic in Bazel, though, so it doesn't seem like a viable alternative.~~
EDIT: Module._resolveFilename provides the parent module and that can be used to determine which version to provide. So if the module map used by require_patch.js keeps information about which paths require which versions, it's possible to solve the diamond dependency issue and make multiple version work!
@alexeagle, did you have any ideas yet for the design of a potential implementation?
Hi @migueloller I was actually thinking about this last night and wrote a bit in #1977 about it.
I don't think we want to get into the business of understanding multiple versions nor the node APIs needed to resolve them. Rather, we want to be 100% mechanical about:
- let the user run the package manager under Bazel
- with a light touch, control the package manager environment so the stuff it installs goes into an external repository
- provide BUILD files for that repository so you can transport a subset of the files into action's execroot / test's runfiles
- make sure that by the time the node program runs, the node_modules directories on disk look just like they would have if there was no Bazel
So this means our responsibility should just be about locations on disk where the packages go, first making them different from node idiom (external repo rather than in the source tree) and then making them the same again (linker).
My theory is that if the linker should not always link a single node_modules tree into the $pwd/node_modules. Instead we should remember what subdirectory of your project had the package.json file, and link the node_modules there. Then we are free to have multiple npm/yarn installed external repositories in the deps[] because each will get linked to a different place. And then when node runs, it does the same thing under Bazel as it would outside of Bazel.
Does that make sense? It feels doable but it's a pretty tricky area of the code.
@alexeagle, yeah that makes sense. If I'm understanding correctly: option 1 would be to patch Node.js require so that nodejs_binary can find the right dependencies at runtime. Isn't there some of that going on right now? Option 2 would be to instead put the node_modules folders in the correct places in the filesystem — likely under runfiles — to achieve the same result.
I agree with you that option 2 is probably the best way to go about this. I took another look at the linker, and if I'm understanding it correctly, does it do the filesystem changes when the binary is executed? Or said a different way, when running a nodejs_binary, before Node.js starts, there's filesystem I/O to either move or symlink node_modules into the right places?
Also, regarding the ability to reference npm/yarn dependencies from multiple external repositories, this would in a way solve the same problem that Yarn workspaces solves and is something that I personally prefer even over Yarn workspaces. It makes sense that if support for multiple package.json is added to the linker, then the node_modules folder could be put in the right place, thus allowing references to different external repositories created by npm_install/yarn_install to be deterministic.
Would you say that these are two separate features, though? One is adding support for Yarn workspaces and the other is adding support for referencing different external repositories for the npm_install and yarn_install rules. Although probably the implementations will overlap.
You understand the linker right. It adapts from Bazel semantics of separate sources/outputs/external repo back to node semantics of those all being together.
Yarn workspaces is a special case of multiple repositories and should be done second. The hard part is calling yarn install and then getting the layout in the right places for Bazel.
We should discuss in the next team meeting, ping me on slack if you want to attend
This issue has been automatically marked as stale because it has not had any activity for 90 days. It will be closed if no further activity occurs in two weeks. Collaborators can add a "cleanup" or "need: discussion" label to keep it open indefinitely. Thanks for your contributions to rules_nodejs!
Now that Workspaces in npm has shipped it's more clear that this is a standard and we should understand it better, probably implement support via a new *_install repository rule.
Any update for this? 🙏
hey @ajanuar - the first blocker was for our linker to understand the multiple node_modules directories created when installing a workspace, this is nearly done in https://github.com/bazelbuild/rules_nodejs/pull/2389 after that lands and ships, then we should be able to tackle this one once we get some more time.
Now that https://github.com/bazelbuild/rules_nodejs/pull/2389 has merged.
What phase of the work is this issue in?
Is this item actively being worked on?
I'm working on a related prototype for how Bazel could fetch only the npm tarballs needed for a build. We're not directly working on supporting workspaces but I think it will be soon. We may need it to support next.js moving to Bazel
On Mon, Feb 15, 2021 at 12:49 PM Jin Huang [email protected] wrote:
Now that #2389 https://github.com/bazelbuild/rules_nodejs/pull/2389 has merged.
What phase of the work is this issue in?
Is this item actively being worked on?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_nodejs/issues/266#issuecomment-779441403, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALSI3GSPACSSDI4WSBICTS7GCFLANCNFSM4FNOURJQ .
I find it odd that bazel, which is a tool for builds in monorepos, doesn't play nice with other monorepo management tools like yarn workspaces or nx. What's the ideal 'bazel' way to handle deps across many apps, then? One big package.json file in the root of the repo? package.json per app/lib? no package.json?
I'd love to hear some strong and reasoned opinions about this...
Is there any level of compatibility with yarn workspaces?
I'm playing around with trying to build in a monorepo managed by yarn workspaces, and it seems like the workspaces dirs being symlinked into node_modules cause problems for bazel:
E.g the following structure:
/WORKSPACE
/BUILD
/package.json (defines workspaces as /apps/*)
/node_modules (contains symlink to /apps/www)
/yarn.lock
/apps
/www. (symlinked to /node_modules)
/BUILD
/package.json
Causes the following error:
ERROR: Traceback (most recent call last):
File "/private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/@cleric/www/BUILD", line 868, column 10, in <toplevel>
filegroup(
Error in filegroup: filegroup rule 'www' in package '@cleric/www' conflicts with existing _js_library rule, defined at /private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/@cleric/www/BUILD:571:11
ERROR: /private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/BUILD.bazel:7883:11: Target '@npm//@cleric/www:www__contents' contains an error and its package is in error and referenced by '@npm//:node_modules'
ERROR: /private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/BUILD.bazel:7883:11: Target '@npm//@cleric/www:www__files' contains an error and its package is in error and referenced by '@npm//:node_modules'
ERROR: /private/var/tmp/_bazel_martaver/226f0bf14f02eb207aab42d5e85425cf/external/npm/BUILD.bazel:7883:11: Target '@npm//@cleric/www:www__nested_node_modules' contains an error and its package is in error and referenced by '@npm//:node_modules'
ERROR: Analysis of target '//apps/www:build' failed; build aborted: Analysis failed
repro here: https://github.com/cleric-sh/experiments-bazel
Seems like almost all monorepo management tools use some kind of symlinks to resolve dependencies between packages... if these symlinks don't work, then what's the 'bazel' way to manage a monorepo and its deps?
@martaver Ive also attempted this and hit a dead end.
There seems to only be 2, not very good, options:
-
Have 1 lockfile per package and define each node_modules within every package as its own managed directory. Bazel would then manage the deps between your own packages. Fundamentally you then dont get hoisting that npm/yarn workspaces give you with one canonical lockfile so your version management would be very difficult with all those lockfiles constantly changing and having to be manually kept in sync. You'd also experience problems with anything that requires 1 copy (singletons) since node will instantiate multiple instances of the same module if they are in multiple locations on the filesystem. That probably means youd need to maintain a bunch of aliases somehow in your bundler. Youd probably have to forgo any quality of life dev tooling that's mono-repo compatible like vite etc and only dev with production builds, not hot reloading etc. Suspect IDE would hate this as well. Its not really a dev experience you could realistically expect a developer to work with.
-
Break your logical dependency tree and have every dep in the top level workspace. This will make your packages not consumable from the outside if you publish them anywhere for consumption. It also only works if your whole tree has 0 cases of "same package different versions". But since so many NPM packages are transitive (with a typically very-large dep tree), this isnt something you could thats easy to control long term and it may be difficult to enforce. It also feels logically wrong to define all dependencies at the top. At least with this solution, you would be able to benefit from a npm-workspace-like dev environment as youd still use npm/yarn workspaces -- you've basically just coerced it to install node_modules a certain way.
-
Variation of 2). If you have
strict_visibility = Falsein youryarn_installrule you might be able to keep your dependencies in the relevant package.json and rely on npm/yarn hoisting to ensure all the deps end up in top level node_modules. Still need the "no multiple copies of any dep" rule in place. Im not sure bazel will correctly rebuild when a deep package.json changes though if you do this. -
Ok, I guess theres a third option. Dont have multiple packages and build your entire app in one big monolith module. Not a good idea for obvious reasons.
Yarn/npm workspaces were designed to solve these exact problems and these are industry standard tools that are in most enterprise projects. It therefore works widely with other tooling in the npm ecosystem. I eagerly await compatibility with bazel.
@martaver You could define one or multiple package.json in repo. They could depend on one another. But the easiest might be just one single package.json. The main downside of that is that a lot of the npm dev toolings, e.e.g IDE, might break.
Here is an example: https://github.com/jinfwhuang/playmono-public/tree/master/nodejs
@jinfwhuang I think the biggest problem with this is you lose all the good positives of a real workspace like one canonical yarn.lock.
How do you sync versions of things used in more than one package?
What is the dev experience?
What about nodes module resolution when working in dev where youd have each module symlinked -- if package a and b requires package x; and a depends on b; and we use node to execute a; then there 2 copies of x in nodes memory and so singletons get broken. This is killer for things like React projects which rely on there being one copy of context providers etc. Essentially, it rules it out as a possibility.
That is not ideal. The most attractive solution is to implement support for yarn/npm workspace.
That would preserve much of the bazel and npm/yarn tooling interoperability, which is kind of required if we want a smooth developer experience.
I'd like to add a few points to this topic as it seems there are differentiating view points on the solution necessary.
Yarn/NPM Workspace Functionality
- Single Module Guarantee : These workspace setups are not perfect and have many pitfalls even though they mask a lot from the developer. For example, you are not guaranteed one version of React , the lockfile can nest different versions if it determines it necessary. On a sufficiently large project, I have hit this exact problem. So this does not solve for that situation because the bundler or build tool can crawl the node_modules tree and find a version of React that is not the one you specified. It is unlikely, but it can happen. In this case, you can still use
resolutionsfield and achieve that all the way down the module tree, but then you are fighting what nested modules specify and are hoping that you don't break them. Fun fact, this is so much a problem that Create React App has a webpack plugin fighting the module resolution system so it works (see https://github.com/facebook/create-react-app/blob/master/packages/react-scripts/config/webpack.config.js#L354-L362) - Linking Modules: Workspaces do link modules that are local, so you are always building off
HEADso to speak. Under Bazel, you get the same guaranteed if you specify the first party package you need in your build file with the multi linker rules @gregmagolan has been working on, so this ability is preserved. However, the benefit you get under Bazel that you don't get under Yarn/NPM Workpsaces is the dependency graph guarantee (meaning rebuilds if A depends on B and B changes, then both B and A rebuild). - Syncing Versions of Shared Tools: Same problem as single module guarantee, so this isn't something that you had before. If you hoisted all dev deps to the root of the workspace, you still have to monitor that nested packages don't end up bringing in a tool. The tool I have used to solve this for us is SyncPack. You can use it with husky to enforce it on a precommit hook. Alternatively, you can have a package under bazel that acts like a CLI and can be invoked to run stuff from a set of shared tools instead of having your packages maintain the build tools themself. Again, this is your preference.
- Single Yarn.lock: While it may seem like a benefit, it means under Bazel your entire world changes if the main lockfile in a workspace changes as the lockfile can be used to key off of for installation and setup of the repo. Multiple lockfiles is not only more accurate to your project setup, but faster long term in regards to installs on CI because touching just package A of set A, B, C would only do a yarn install on A (assuming B and C are standalone from A). This is not true in the traditional workspace sense the NPM ecosystem as it stands today since usually workspaces are used to create tools and frameworks. However, in a company circumstance, this is very much the case and is a benefit not a negative.
This being said, I think we still fall short with things like IDE integration and local node_module expectations. That being said, I haven't seen either of those things scale well once you work at a large company because custom tooling and build systems never seem to take those things into consideration. I agree it's not great and should be improved, but arguably also the plugin and IDE ecosystem should be more flexible with their assumptions instead of just taking everything to be a small personal project.
Long term, I expect that most things should move towards a PNPM/Yarn V2 style install to help with speed, so things like a single workspace package won't be terrible and will help more with the single version everywhere problem.