Implement optimal JavaScript module design
As discussed in the June 2025 GraphQL.js Working Group meeting (see YouTube recording, and also https://github.com/graphql/graphql-js-wg/issues/161), that along with publishing (ideally pure) ESM, optimizing the published GraphQL.js JavaScript module structure will make the GraphQL ecosystem more performant at build and runtime, and greatly improve performance when consuming the graphql package modules via ESM HTTP imports in servers (such as Deno) and browsers by reducing loading waterfalls and avoiding downloading and parsing code redundant to the app.
A detailed explanation of optimal JavaScript module design and its benefits can be found in this article:
https://jaydenseric.com/blog/optimal-javascript-module-design
The core principal is that each public export of the package should have it's own deep-importable module, where it can be accessed without bundling/downloading/parsing any other code than is necessary to power the specific thing being imported. So, one default export per module file. A great convention is to name the file exactly after the name of the export, e.g. the scalar GraphQLInt would be a default export in the module GraphQLInt.mjs. Internal things should also have their own seperate modules and be directly imported via relative deep imports so that when a public export is imported, only the absolutely necessary internal dependencies get loaded in the module graph.
The antithesis of optimal module design is index/"barrel" modules that have a bunch of named exports. While we can offer deep imports for the entire API as well as sub-optimal index modules, in practice this is an anti-pattern because the community will not always understand that the correct way is to always do deep imports and never ever import from the main index, and they will publish a mix of deep and non deep imports in GraphQL ecosystem packages. Even if you know what you're doing and want to deep import, your editor may notice both ways are available and annoyingly bias towards auto-importing things from the main index. It takes just one dependency importing from the main index to pull every GraphQL.js module into your app and negate all the best practice deep imports everywhere else in your app and dependencies.
If users should never ever import from the index modules, instead of fighting a losing educational battle, just don't publish the index modules! The entire GraphQL JS ecosystem will fall effortlessly into the pit of success. GraphQL.js "glues" the ecosystem together, so it's important to get this right in the graphql package.
Here is an example where importing from the main index module in a browser or Deno will undesirably cause the entire GraphQL.js library to be loaded, and with extra waterfall loading steps as first the index module is cached, and then it recursively caches nested imports in the module graph:
import { GraphQLInt } from "https://unpkg.com/[email protected]/index.js";
For a Node.js project, an import like this will cause the entire GraphQL.js library to be read from the filesystem and be parsed by the runtime:
import { GraphQLInt } from "graphql";
If you use a tree-shaking bundler to process the above import, it will have to work hard to attempt to figure out what library code your app doesn't actually use and eliminate it from the final bundle, increasing build times.
All the above issues can be avoided if it's possible to import like this instead:
import GraphQLInt from "graphql/GraphQLInt.mjs";
If we want, we can arbitrarily group related modules in directories (e.g. graphql/scalars/GraphQLInt.mjs), but that has downsides:
- It introduces subjective decisions about how to group things under what names, making it more difficult to coordinate decisions and contributions.
- Extra import path segments (e.g.
/scalars/) bloat the file size of modules that import a lot from GraphQL.js, increasing the size ofnode_moduleson disk, and the amount of bytes web apps using native ESM and HTTP imports have to download and cache at runtime. - Users manually typing deep import paths are likely to remember the name of the thing they want to import (e.g.
GraphQLInt), but they might forget how it is arbitrarily nested under a path like/scalars/. If everything is available from the package root (i.e.graphql/+ the export name) then it's easy to remember, and there is discoverability as intellisense provides suggestions while you begin typing the path. - The extra import path segments add visual noise at the top of files. When lots of things are deep imported, it's nice to avoid the repetition in vertical columns of text.
- It adds a (small) friction to maintainers/contributors working on GraphQL.js modules as you need to do relative import paths that reach up out of the correct directory and back down into other directories. When moving code or your attention around different parts of the codebase, it's a breath of fresh air to be able to always just write
./+ the name of the thing you want.
There are upsides though:
-
Imports of related things will sort together in projects that have linters/formatters sorting imports. This is not a big factor in my own open source packages because they usually have inventory style of naming of things so the names naturally sort together. The GraphQL.js public exports, unfortunately, sometimes names related things with a common suffix instead of a prefix. For example, validation rules are name +
Rule, instead ofRule+ name:https://github.com/graphql/graphql-js/blob/ba4b411385507929b6c4c7905eb04b3e6bd1e93c/src/index.ts#L344-L383
The same goes for AST nodes:
https://github.com/graphql/graphql-js/blob/ba4b411385507929b6c4c7905eb04b3e6bd1e93c/src/index.ts#L254-L312
It's not all bad though, because a lot of names begin with
GraphQL,assert,is, etc. and will sort nicely. -
Because this package has a build-step producing the published artefacts, it would be fairly easy to have a
srcdirectory containing sub directories such asvalidators, etc. that would build to Git-ignored paths such asgraphql/validators/. If we don't do this, we would need a directory named something likedistto contain and Git-ignore all the build artefacts to avoid them being dumped in the project root. To avoid having to havegraphql/dist/in the import paths, it would the be necessary to use the package fieldexportsto re-map imports from the root to the/dist/directory. Re-mapping imports via the package fieldexportsis an anti-pattern, because some CDNs serving thegraphqlpackage for HTTP imports just statically serve the package files and don't have clever routing that reads the export rules from thepackage.jsonto re-route requests.
Normally my preference is to avoid sub-directories and publish everything from the root (which is easy when the project doesn't have a build step), but this aesthetic choice doesn't greatly affect the performance of the optimal module design other than increasing file size.
Actioning optimal module design is somewhat related to supporting native ESM:
- https://github.com/graphql/graphql-js/issues/4062
- https://github.com/graphql/graphql-js/issues/2721
There are many nuances about anti-patterns and best practices for publishing pure ESM optimal modules, relating to package exports field, etc. I haven't captured yet in this particular issue but can be articulated over time where relevant.
There are a few ways to approach public and private modules regarding directory structure and the package field exports. Personally, I like to simply manually list the public modules in the package exports, and anything not listed is private and automatically blocked from being imported by runtimes and bundlers. This has upsides:
- It gives a complete list of public things that exist in the package for certain dev tools, so they don't need to read the files and attempt to apply export rules to figure out what's available.
- If we decide to change something from private to public, you just add an
exportsfield entry for it and don't have to physically move around files (e.g. out of aprivatedirectory) which can cause merge conflicts for other PRs touching these files.
But it also has the downside of increased risk of human error as you have to manually curate a list of what is public, and remember to expand it when adding new public modules in the future.
An alternative approach is to have package exports rules that automatically make all modules in certain directories (e.g. validators/) public, except when nested under a directory called private.
Reference projects
graphql-upload features pure ESM and optimal JavaScript module design, with no main index module and the only way to consume it exports is via deep imports, e.g:
import GraphQLUpload from "graphql-upload/GraphQLUpload.mjs";
You can see how the package exports field is implemented:
https://github.com/jaydenseric/graphql-upload/blob/421707f3b4e2b0c18ed9beec8eeaf3a0c942841d/package.json#L42-L49
The ESLint plugin eslint-plugin-optimal-modules isn't configured because manually enforcing correctness is viable for such a stable project with a small module count.
Note that it would be great to be able to deep import from graphql once that's possible here:
https://github.com/jaydenseric/graphql-upload/blob/421707f3b4e2b0c18ed9beec8eeaf3a0c942841d/GraphQLUpload.mjs#L3
Other packages with a lot more public exports are graphql-react, device-agnostic-ui, and the buildless React web app framework Ruck, which are actually used together to serve the article without any build steps, with pure ESM via HTTP imports in Deno and the browser:
https://jaydenseric.com/blog/optimal-javascript-module-design
It would be really cool if the graphql package had optimal module design, to allow building GraphiQL like front end experiences in web apps without any build steps or bundling.
Proposal
- [ ] Decide if modules will be organised in directories. If so, what are the directories called.
- [ ] Decide if we will also remove index modules from the package, to force the community to adopt the new deep imports API. I recommend removal!
- [ ] Audit the 70+ open PRs to decide which will be merged prior to locking any further merges until the optimal module design PR is merged. This will avoid terrible merge conflicts as many project files will be moved around.
- [ ] Create the optimal module design PR.
- [ ] Setup the ESLint plugin
eslint-plugin-optimal-modulesthat bans named exports with an explanatory lint error message. - [ ] Setup ESLint rules to enforce that module file names match exactly the name of the thing being exported, and that default imports match the name of the thing being imported.
- [ ] Move all named exports within the project into seperate module files with a default export, named after the exported entity. VS Code TypeScript refactoring features can be used to automatically update all affected import paths as the exports are being moved.
- [ ] Delete index modules from the project.
- [ ] Remove the package fields
mainandmodule. - [ ] Add the package field
exports, defining the public API and automatically preventing imports of private modules. - [ ] Update all imports in documentation to the new deep imports.
- [ ] Setup the ESLint plugin
- [ ] Merge the PR as a semver major change for a new
graphqlrelease.
Reading this, I feel like this is heavily skewed towards a very specific consumer: unbundled ESM over HTTP And it feels like it doesn't take other use cases into account as much as it probably should.
I want to get one opinion out of the way first so that you can take everything else I say with a grain of salt: I don't think unbundled ESM over HTTP will become mainstream in the browser anytime soon, if at all.
If every entry point here were completely isolated, yes, this approach could work, and it would have massive benefits. But every entrypoint here imports from other, internal, entrypoints. There is an unavoidable waterfall here.
Let's see how much additional data could be transferred only while the loading application waits for the transport waterfall:
Assumed import depth of 5 with a EDGE connection (latency: 0.6s/request, of 1Mbit/s) - could transfer 0.75MB in the time the waterfall takes. Assumed import depth of 5 with a good 3G connection (latency: 0.1s/request, of 2Mbit/s) - could transfer 0.25MB in the time the waterfall takes. Assumed import depth of 5 with a bad 4G connection (latency: 0.05s/request, of 5Mbit/s) - could transfer 0.3125MB in the time the waterfall takes. Assumed import depth of 5 with a good 4G connection (latency: 0.03s/request, of 100Mbit/s) - could transfer 3.75MB in the time the waterfall takes. Assumed import depth of 5 with a bad 5G connection (latency: 0.01s/request, of 40Mbit/s) - could transfer 0.5MB in the time the waterfall takes.
Here I'm assuming a very generous import depth of 5, while in reality, it's probably more around 8. And I'm not even taking the time the server needs to create a response into account. Pretty much any network connection could transfer the entire library in the time it would take to just wait for the latency introduced by the unbundled ESM over HTTP approach. This waiting time will never go away without bundling.
Then there's the second part - time spent on parsing and execution. This seems like something that could be solved on the language level (and is currently stage 2.7 with the Deferring Module Evaluation proposal) instead of changing every single dependency of the ecosystem to a new paradigm.
That leaves the other "ESM over HTTP" use case, the server side - and here, we're either talking long-running processes, probably with a local cache that persists between restarts, or serverless functions which AFAIK have to be bundled for most cloud prodiders.
So... while I want to acknowledge that the "ESM over HTTP" case exists, I think it is a very specific use case that is not going to be the mainstream use case for a long time, if ever - and there is a lot of potential for optimizations that doesn't need all of this.
Now, to the part where I am concerned about this approach: I believe this will really hurt developer experience.
- In my experience, default exports are generally a pain to work with as IDEs have only subpar support for them.
- Some of these will make working on the
graphql-jspackage itself annyoing, stuff like renaming internal exports doesn't work well (if at all) with default exports, so internal refactors might end up with multiple different names for the same thing. - Others will also affect users of the library - IDEs prioritize imports that are already used in their suggestions, so general discovery of the API will be harder, as new imports will drop into a jungle of other installed libraries, if they get included in the autocomplete at all.
- Types will still need to be imported as named imports (unless we ship hundreds of empty files with an accompanying
.d.tsfile), so users will end up with an unintuitive mix of default and named imports.
- Some of these will make working on the
- Users will end up with a lot of imports at the top of their files. I already see this mentioned as "disliked" regularly by users with named exports, and the "wall of text" will only get worse with one file per export.
- This kind of breaking change will make it impossible for libraries to support both GraphQL 16 and 17, which will probably slow down adoption to the new version for a long time.
All in all, I think this approach certainly has it's merits - but mostly for a very specific and strongly opinionated use case. I don't think that the userbase using bundlers would benefit from this as much as suggested, but I think they would find this to be a very interruptive change that would make their lives harder.
Personally, I don't have that use case - and as the maintainer of a library that has to support GraphQL 16 and 17, I don't see a good way forward.
Maybe there is also some middle ground, acknowledging the different requirements for different ecosystems, like shipping a different package shape for Deno and npm?
Depth of 5 is definitely very generous; for the GraphQLInt case that was suggested as a motivational example, I traced (without trying) this path: GraphQLInt.mjs ---> GraphQLScalar.mjs ---> assertName.mjs ---> GraphQLError.mjs ---> printSourceLocation.mjs ---> getLocation.mjs ---> invariant.mjs (depth=7) - and that’s just one trivial example that I found manually with minimal effort.
To add some perspective, all exports of the main entry point bundled together add up to about 70kB gzipped, not minified - which would be transferred in a fraction of this waterfall. (Sorry, couldn't figure out how to disable compression on that website, but realistically a HTTP connection will likely be compressed anyways.)
So I think even in the ESM over HTTP case, bundling exacly what you need (which will be another fraction of the above, GraphQL.js already tree-shakes very well in it's current state) or even just blindly bundling everything makes a lot more sense for actual speed gains than splitting it up into even more files.
(Note that I'm not suggesting to ship a bundled file as part of the graphql-js package, as that would also require setting a fixed transpilation target. This kind of bundling should probably be done by the consuming application or a CDN.)
@phryneas
Reading this, I feel like this is heavily skewed towards a very specific consumer: unbundled ESM over HTTP And it feels like it doesn't take other use cases into account as much as it probably should.
Firstly, anything done to optimise performance of unbundled ESM over HTTP also improves performance when using ESM over local filesystem (i.e. Node.js and node_modules), and also improves bundling performance. Fundamentally optimal module design is about making faster, less indirect routes to get to the specific code the application needs, without pulling in unneeded code. There is no downside to optimal module design; it's better for everyone.
I have lead teams to optimise the module design in enterprise codebases and had massive build time and bundle size wins. I have seen other people adopt the patterns in other workplaces and had big wins too. It's not just an ESM over HTTP thing.
I want to get one opinion out of the way first so that you can take everything else I say with a grain of salt: I don't think unbundled ESM over HTTP will become mainstream in the browser anytime soon, if at all.
Disclaimer noted and appreciated ❤️ If maintainers of critical packages resist or are hostile (not accusing anyone here!) to optimising for unbundled ESM over HTTP, then yes, it will be harder for the wider community to discard the mountain of build tooling typically used to make suboptimal dependencies that are not aligned to web standards (e.g. CJS vs ESM) work in browsers.
But if we listen to the lived experience of people building apps using web standard technologies in the way they were originally intended and optimise the ecosystem dependencies, adoption of web standard approaches is perhaps inevitable. If people can deliver the same or better experiences to users without complicated build tooling dependencies and config, and dealing with seperate source and build artefacts, and waiting for builds all the time, they would.
"Throw everything into arbitrary modules and let webpack sort it out" isn't engineering, and it's not leading to good outcomes for the end user. Devs and users alike know apps are more bloated than they should be. To be cheeky, let's look at apollographql.com:
2.4 MB of 5.8 MB (only 41%) of the CSS and JS that loads on the homepage is actually used; 3.4 MB is unused.
And the thing is, because of the way people bundle per route, as you navigate around and load more route bundles, you can end up loading the same dependencies again and again as not all are added to commons bundles. If you add everything to commons bundles, then every visitor is forced to download and parse that code regardless if it's used for the current route.
"But even though there is a lot of waste, fast internet connections and powerful hardware can deal with it" is the reason why the bloat is allowed to accumulate year on year, even exceeding the pace of network and hardware improvements.
Problems like the crazy size of project node_modules directories is a thing of the past once you move to Deno and caching only the specific modules your application imports. You don't have to download to disk the entire package if you are only using some of its modules.
Now about concerns of ESM over HTTP loading waterfalls, having actually built perhaps 6 or so apps with native ESM over HTTP, it's a non-issue. Typically you can expect only a couple of waterfall loading steps when working with dependencies that have optimal module design. If you try to trace individual paths in a route's module graph, you might find some long ones like @benjie pointed out, but in a realistic app modules higher up the waterfall that have already pulled in a dependency module mean those same modules don't have to be loaded again for all deeper branches in the module graph. The loading tends to flatten out, and over time as you navigate more routes, you get a large number of cache hits and you barely have to load any additional dependencies.
Once you take note of overly deep internal module graphs, things can be done to flatten them out and think critically about what needs to exist in standalone modules. Ideally, we can just make simpler code in the first place and avoid complexity. That level of effort is justified and valuable for a package like graphql that "glues" an ecosystem together.
In apps that bundle, if you push an update to just one part of your page, you invalidate the cache for the entire bundle so return visitors have to download everything all over again. In regularly maintained apps, that might happen every visit. When you use native ESM via HTTP imports from CDNs, you can do all sorts of neat performance things like have far future caching for third party dependencies at specific versions.
There are other exciting things you can do to reduce waterfall loading in CDN's and in your application server for your ESM. You can server-side analyse the module graph for the requested module, and respond with module preload hint headers for all the nested imports, to flatten out the loading waterfall to just the initial level and one other. @dburles has done work in this space.
We are not doing anything weird; we are just using the web as it is designed to function and are realising it works really well and it a breath of fresh air. It's strange that those of us in the community who are using less tools and more web standards feel the burden to justify ourselves, and not the other way around.
Now, to the part where I am concerned about this approach: I believe this will really hurt developer experience.
Having worked extensively in both old and new school codebases, I can subjectively testify that the developer experience is just as good, or better, when working with optimal modules. It's very easy to find things in the project and it's a bit like how with Prettier you don't have to think about formatting; with optimal module design enforced you don't have to think about what exports to arbitrarily group in modules, and what names to give files. You can easily scroll a nice hierarchy of module files in your editors file explorer panel, and open just that code above the fold in your editor without having to scroll to find things.
Regarding default exports vs a named export, editors are actually really good now at renaming symbols and their default imports throughout the project and while I faintly recall issues with that several years ago, it hasn't been a bother in recent memory. In enterprise codebases where you can't be sure people are using the right tools and being disciplined about the default imports, exports, and file names matching I setup ESLint rules enforcing all of those things so there are guarantees of correctness when people contribute PRs. We can add setting up those ESLint rules to the todo list in this issue.
@benjie
Depth of 5 is definitely very generous; for the GraphQLInt case that was suggested as a motivational example
Please don't read into the GraphQLInt case too far; I picked it as the most basic and common thing you import a lot from GraphQL.js (in a code first GraphQL server codebase). There are far more compelling use cases for importing optimal GraphQL.js modules over HTTP. For example a web app might have a GraphQL client (like Apollo Client) that uses graphql-tag for queries:
https://github.com/apollographql/graphql-tag/blob/f463d8765709ec5764066024e9b94519d9563bd9/src/index.ts#L1-L7
import { parse } from 'graphql';
import {
DocumentNode,
DefinitionNode,
Location,
} from 'graphql/language/ast';
It would be obviously beneficial to be able to deep import and only download code to do with parsing, and not executing and all sorts of other things.
Another example might be, having GitHub actions that ensure a schema.graphql is updated in a client repo when the schema in the GraphQL API repo changes. It could be a small Deno script using GraphQL.js things like buildClientSchema, getIntrospectionQuery, lexicographicSortSchema, printSchema. You only want the Action to have to cache and parse at runtime the relevant code to do that, without pulling in unrelated GraphQL.js code to do with execution, etc.
Anyway, whatever way you use GraphQL.js could be better with optimal modules.
Thanks for the mention @jaydenseric.
Just thought I’d share another perspective and walk through a line of thought that leads us to this pattern.
Starting with the concern with request waterfalls, I’d like to give an example of a different way of solving this in the browser without bundling. I have been exploring a proof of concept around serving modulepreload link relations for a Link entity-header generated based on the module import graph of each requested module (see: https://github.com/dburles/modulepreload-link-relations). It’s really exciting to see it working in action and I believe it’s a viable means of addressing the waterfalls that occur with deeply nested imports. The other advantage (amongst others) as Jayden highlighted is that there is no unused code being downloaded or executed. This combination far exceeds the optimisation capabilities of any bundler.
Hopefully we can already start to see why at least a main export is a massive hinderance here. If we can all acknowledge that at least we can move away from a main export and to deep imports. Then what remains is the decision then whether we should retain certain groups of multi export modules. Once we are at that point then we should validate the reasoning behind it.
Continuing with the train of thought – are the arguments for grouping multiple exports that it’s easier for maintainers? or is it a loose attempt at some kind of opinionated HTTP optimisation? or is it because we feel users find multiple import statements too frightening or they don’t like how it looks? They’re not exactly the strongest of arguments. The more we think about it, (and I have myself also spent a great deal of time considering it), as perhaps given the paradigm shift that it may be, the logical conclusion for publishing a package with such a wide array of consuming environments, is always one file per export modules.
In addition, publishing as one export per module doesn’t cause any real hinderance to those that bundle their source, but provides a massive enhancement (across all environments) for those that do not, as already illustrated by Jayden.
IDE support for deep imports default exports is not a concern from my experience either as I’ve also been publishing modules like this for some time now and work alongside a team with them. The DX is just as good (at least in VSCode).
Even if we all agree on the technical merit of this approach, we should then address any concerns outside of it as the technical arguments have already been put forward in depth.
@jaydenseric Thanks so much for attending the meeting and sharing your passion on this issue with us. Especially considering the time zone difference.
I vibe-coded a script to inspect in greater detail our import hierarchy more systematically.
See https://github.com/yaacovCR/graphql-js/tree/demonstrate with the latest commit there.
We currently have a maximum import depth of 5 (or 6 if you count the first requested file, I am using depth here to mean how many times the "import" keyword is used.
The top offending 11 files that have a depth of 5 (or 6):
[
{
"from": "utilities\\replaceVariables.mjs",
"to": "jsutils\\invariant.mjs",
"depth": 5,
"path": [
"utilities\\replaceVariables.mjs",
"utilities\\valueToLiteral.mjs",
"type\\definition.mjs",
"jsutils\\didYouMean.mjs",
"jsutils\\formatList.mjs",
"jsutils\\invariant.mjs"
]
},
{
"from": "utilities\\getDefaultValueAST.mjs",
"to": "language\\ast.mjs",
"depth": 5,
"path": [
"utilities\\getDefaultValueAST.mjs",
"utilities\\astFromValue.mjs",
"type\\definition.mjs",
"language\\printer.mjs",
"language\\visitor.mjs",
"language\\ast.mjs"
]
},
{
"from": "validation\\validate.mjs",
"to": "utilities\\coerceInputValue.mjs",
"depth": 5,
"path": [
"validation\\validate.mjs",
"validation\\specifiedRules.mjs",
"validation\\rules\\SingleFieldSubscriptionsRule.mjs",
"execution\\collectFields.mjs",
"execution\\values.mjs",
"utilities\\coerceInputValue.mjs"
]
},
{
"from": "validation\\rules\\DeferStreamDirectiveLabelRule.mjs",
"to": "language\\ast.mjs",
"depth": 5,
"path": [
"validation\\rules\\DeferStreamDirectiveLabelRule.mjs",
"type\\directives.mjs",
"type\\definition.mjs",
"language\\printer.mjs",
"language\\visitor.mjs",
"language\\ast.mjs"
]
},
{
"from": "validation\\rules\\DeferStreamDirectiveOnRootFieldRule.mjs",
"to": "language\\ast.mjs",
"depth": 5,
"path": [
"validation\\rules\\DeferStreamDirectiveOnRootFieldRule.mjs",
"type\\directives.mjs",
"type\\definition.mjs",
"language\\printer.mjs",
"language\\visitor.mjs",
"language\\ast.mjs"
]
},
{
"from": "validation\\rules\\KnownArgumentNamesRule.mjs",
"to": "language\\ast.mjs",
"depth": 5,
"path": [
"validation\\rules\\KnownArgumentNamesRule.mjs",
"type\\directives.mjs",
"type\\definition.mjs",
"language\\printer.mjs",
"language\\visitor.mjs",
"language\\ast.mjs"
]
},
{
"from": "validation\\rules\\SingleFieldSubscriptionsRule.mjs",
"to": "language\\ast.mjs",
"depth": 5,
"path": [
"validation\\rules\\SingleFieldSubscriptionsRule.mjs",
"execution\\collectFields.mjs",
"type\\definition.mjs",
"language\\printer.mjs",
"language\\visitor.mjs",
"language\\ast.mjs"
]
},
{
"from": "validation\\rules\\UniqueDirectivesPerLocationRule.mjs",
"to": "language\\ast.mjs",
"depth": 5,
"path": [
"validation\\rules\\UniqueDirectivesPerLocationRule.mjs",
"type\\directives.mjs",
"type\\definition.mjs",
"language\\printer.mjs",
"language\\visitor.mjs",
"language\\ast.mjs"
]
},
{
"from": "validation\\ValidationContext.mjs",
"to": "jsutils\\invariant.mjs",
"depth": 5,
"path": [
"validation\\ValidationContext.mjs",
"utilities\\TypeInfo.mjs",
"type\\definition.mjs",
"jsutils\\didYouMean.mjs",
"jsutils\\formatList.mjs",
"jsutils\\invariant.mjs"
]
},
{
"from": "utilities\\introspectionFromSchema.mjs",
"to": "utilities\\astFromValue.mjs",
"depth": 5,
"path": [
"utilities\\introspectionFromSchema.mjs",
"execution\\execute.mjs",
"type\\validate.mjs",
"type\\introspection.mjs",
"utilities\\getDefaultValueAST.mjs",
"utilities\\astFromValue.mjs"
]
},
{
"from": "utilities\\stripIgnoredCharacters.mjs",
"to": "jsutils\\invariant.mjs",
"depth": 5,
"path": [
"utilities\\stripIgnoredCharacters.mjs",
"language\\lexer.mjs",
"error\\syntaxError.mjs",
"error\\GraphQLError.mjs",
"language\\location.mjs",
"jsutils\\invariant.mjs"
]
},
]
There are 35 files that have a depth of 4 (or 5).
@jaydenseric
- I don't see a lot of rigor here in terms of the argument as to why the waterfall is not a problem. As you mix in more files with a flatter tree, the import depth would certainly decrease, but I sure would like to see some numbers. Something like: this is how it worked with a bundler, this is how it works with default-exports-only, the savings were because x-y-z. The waterfall delay is overwhelmed by a-b-c.
- I was curious what you meant in terms of:
Once you take note of overly deep internal module graphs, things can be done to flatten them out and think critically about what needs to exist in standalone modules. Ideally, we can just make simpler code in the first place and avoid complexity.
What in particular would you suggest in terms of the import hierarchy for perhaps one of the files above?
@dburles
Using preloading to fast-track downloading the bundle seems like a great idea, but it also seems to be solving the waterfall by potentially recreating the original problem of downloading unused code. If this is the form of the "fully-worked out solution," in what way does the end result end up better than bundling? I presume it might be just that a change in the way the bundle is anchored to less code rather than the reverse?
Thanks to you both for pushing this issue forward. Please read the above as an attempt just to draw out some specifics; I certainly don't have the experience that both of you seem to in moving this forward on actual code bases, and am just trying to get a feel for how this work work.
I think an interesting intermediate step to think about would be to limiting our files to a single named export, and see how that goes, and then see if we can move all the way to a single default export.
Hey @yaacovCR thanks for the visibility on import depth.
Using preloading to fast-track downloading the bundle seems like a great idea, but it also seems to be solving the waterfall by potentially recreating the original problem of downloading unused code. If this is the form of the "fully-worked out solution," in what way does the end result end up better than bundling? I presume it might be just that a change in the way the bundle is anchored to less code rather than the reverse?
There is no 'bundle', just plain JavaScript modules in the browser. When a module is requested, a Link header is served alongside it that contains a de-duplicated list of all of its imports (and their imports), which the browser will then preload in parallel, avoiding a waterfall. There's no dead code caused by bundling of course but there's also no tree-shaking, so there would only be dead code if anything exported by the imported module was not used. Which is why we're advocating for single export modules.
I don’t wish to derail the issue too much on this topic. I’m happy to answer any additional questions over here https://github.com/dburles/modulepreload-link-relations/discussions.
and also improves bundling performance
A bundler only parses, it doesn't execute, so it won't follow unrelated imports/exports in a side-effect-free package. I can't imagine big gains here tbh, especially given how much file system activity happens for every import just to discover the right file - adding a bunch of extra files to import from likely has a much higher impact as each import will cause a dozen or so file system lookups during resolution.
And the thing is, because of the way people bundle per route, as you navigate around and load more route bundles, you can end up loading the same dependencies again and again as not all are added to commons bundles. If you add everything to commons bundles, then every visitor is forced to download and parse that code regardless if it's used for the current route.
Both of these extremes sound like bad chunking - I will acknowlege that these extremes exist and that probably very few applications out there arrive at "perfect" chunking, but this overplays it - we're probably off by a few percent in most cases.
"But even though there is a lot of waste, fast internet connections and powerful hardware can deal with it" is the reason why the bloat is allowed to accumulate year on year, even exceeding the pace of network and hardware improvements.
This statement seems very over the top. If this were the case, in the last 10 years we would have seen webpages arrive in the gigabytes where in practice we're still having the same discussion as back then over, in bad cases, a few megabytes.
Problems like the crazy size of project node_modules directories is a thing of the past once you move to Deno and caching only the specific modules your application imports. You don't have to download to disk the entire package if you are only using some of its modules.
Same thing, this seems like hyperbole to me. Yes, it's big, no, it's not crazy. I currently have about 150 projects checked out, without using a package manager that does any kind of deduplication (because frankly, on a modern dev machine, there's no need for that), and I'm at about 70GB. And these npm modules oftentimes ship with full source code, source maps, tests, documentation, and other stuff in their bundle, so the amount of code that actually need parsing is a tiny fraction of that. If I were to use a deduplicating package manager, that would probably be less than 15 GB over all of these. Keep in mind that for other languages you have to install tools like XCode which are 40GB on their own, without even a single project. My docker images and Nix installation are a mulitude of my node_modules folders, yet I don't see similarly intense discussions in these ecosystems.
Just to add this here:
I would really prefer it if we could have this whole discussion calmly without going into extreme statements at every corner. Hyperbole on your side will just provoke hyperbole on my side, and I don't think we will get anywhere with that in the end.
Very few things are "always bad" and almost nothing ever is "optimal" for all use cases - I have to admit that the name "optimal module design" is really making me nervous, because it implies that you are either not aware of possible drawbacks (which I don't really believe after all the discussions you probably already had about this) or discarding them as irrelevant (which feels far from ideal).
I see definite tradeoffs and I don't see them discussed openly enough to feel comfortable with this while proposal.
If you try to trace individual paths in a route's module graph, you might find some long ones like @benjie pointed out, but in a realistic app modules higher up the waterfall that have already pulled in a dependency module mean those same modules don't have to be loaded again for all deeper branches in the module graph.
So the waterfall does happen, you're just not concerned about it because it happens only once?
When you use native ESM via HTTP imports from CDNs, you can do all sorts of neat performance things like have far future caching for third party dependencies at specific versions.
I do remember reading a study about cache life expectancy by Meta recently (unfortunatey I can't seem to find the link) that found that pretty much everything drops out of the cache after a day or two, so I don't think this is as relevant as you make it out to be.
There are other exciting things you can do to reduce waterfall loading in CDN's and in your application server for your ESM. You can server-side analyse the module graph for the requested module, and respond with module preload hint headers for all the nested imports, to flatten out the loading waterfall to just the initial level and one other. @dburles has done work in this space.
It's great that this is being explored for ESM over HTTP, but again, stuff like preload hints are already well supported in modern bundled frameworks. You're not reinventing the wheel here, you're just reapplying existing solutions to a different ecosystem (I don't want to talk this small, this is great!).
We are not doing anything weird; we are just using the web as it is designed to function and are realising it works really well and it a breath of fresh air. It's strange that those of us in the community who are using less tools and more web standards feel the burden to justify ourselves, and not the other way around.
You are asking to change a package published to the node package registry, which is mostly used by people who will not get the same core benefits, but might be forced to deal with the drawbacks. Of course we will have to be careful here.
We can't have people fall into a pit of success if they reach the bottom of that pit with broken legs, can we?
That said, graphql-js has two different distribution channels: npm and deno.
As I already said, there might very well be a case to handle these two differently.
It's very easy to find things in the project and it's a bit like how with Prettier you don't have to think about formatting; with optimal module design enforced you don't have to think about what exports to arbitrarily group in modules, and what names to give files.
These groups and names help with discoverability and lower mental overhead for consumption, though. Someone looking for schema-related functionality can easily ignore parsing-related features etc.
You can easily scroll a nice hierarchy of module files in your editors file explorer panel, and open just that code above the fold in your editor without having to scroll to find things.
VSCode has an outline panel directly under that explorer panel, although I almost exclusively navigate code bases by Cmd-Clicking symbols.
These arguments are mosty about individual habits, both ways of working seem valid to me - but again, I don't want to force people into a specific workflow if they already have a good one that works for them. Especially since this will not change their full projects, but just one dependency - and leave them with a mix of workflows that they have to adapt to.
Regarding default exports vs a named export, editors are actually really good now at renaming symbols and their default imports throughout the project and while I faintly recall issues with that several years ago, it hasn't been a bother in recent memory.
I just reproduced this in a project I already have open (VSCode 1.101.1, TypeScript 5.8.3):
Create a file foo.ts with the following content:
function foo() {
console.log("This is a placeholder function in foo.ts");
}
export default foo;
Import from it in another file:
import foo from "./foo.js";
console.log(foo);
Rename the function foo to bar. It is not renamed in the other file.
It is renamed when I rename the export directly, but the fact that this is not happening automatically is far less visible than it were with a named export.
Even if we ignore this and the other DX issues I brought up (and please, let DX be the last thing we ignore!), there is still one very fundamental issue with this proposal:
It's inherently breaking and will split the ecosystem into a "before" and "after", with the risk of hindering adaption for years.
Let me give you and example:
We are just about to release Apollo Client 4.0, which is the first major version in 5 years. (We want to get faster with that, but don't expect us to release a new major every few months now.)
Right now, the stable version of graphql-jq is 16, so we have to support that.
Now, if in v17 any imports we rely on are changed, we cannot adapt v17 during the lifetime of Apollo Client 4.0, as there is no way of supporting both in parallel.
This will put us in a stalemate: Packages that are targeting Apollo Client 4.0 will not update graphql-js either, which means that part of the ecosystem cannot start adoption, and we cannot argue to switch in good conscience, because we would make it impossible for all of our consumers that are still using graphql-js 16 to update to our newest major at the time.
We know that even today, 5 years after 3.0 was released, teams are actively working with Apollo Client 2.x, so this is not a hypothetical situation - people update slowly and many teams (especially in big companies) cannot afford big-bang updates.
Realistically, we would be forced to wait for the ecosystem to catch up, and in the meantime we would be stuck with an outdated version of graphql-js and actively hindering that update wave.
Not exactly the place I want to be in to be honest.
Also note that the graphql-js ecosystem is already a slow-moving one - currently only 63.8% of downloads are even at v16.
To clear up confusion about avoiding waterfalls with ESM over HTTP, let's go over a hypothetical scenario…
Module graph
graph LR
A[a.mjs] --> B[b.mjs]
A --> C[c.mjs]
C --> D[d.mjs]
D --> E[e.mjs]
gantt
title Loading waterfall (without preload headers)
dateFormat SSS
axisFormat %L ms
section a.mjs
a.mjs :a1, 0, 50ms
section b.mjs
b.mjs :a2, after a1, 60ms
section c.mjs
c.mjs :a3, after a1, 70ms
section d.mjs
d.mjs :a4, after a3, 40ms
section e.mjs
e.mjs :a5, after a4, 35ms
If a.mjs is served with this header:
Link: </b.mjs>; rel=modulepreload, </c.mjs>; rel=modulepreload, </d.mjs>; rel=modulepreload, </e.mjs>; rel=modulepreload
gantt
title Loading waterfall (with preload headers)
dateFormat SSS
axisFormat %L ms
section a.mjs
a.mjs :a1, 0, 50ms
section b.mjs
b.mjs :a2, after a1, 60ms
section c.mjs
c.mjs :a3, after a1, 70ms
section d.mjs
d.mjs :a4, after a1, 40ms
section e.mjs
e.mjs :a5, after a1, 35ms
The waterfall is flattened from 4 levels to a maximum of 2, reducing the loading time in this case by 75ms.
It's relatively easy to implement this, I plan to use @deno/graph in Ruck to analyse the module graph for requested .mjs files (that exist and are not 404s) using the client import map and add the Link header with modulepreload hints in the response:
https://github.com/jaydenseric/ruck/blob/b8f91197178c6a53c31de08cab1d35e74c6acb7f/publicFileResponse.mjs#L69-L71
Note that CDNs such as esm.sh achieve the same thing, but a different way that involves modifying the requested module by injecting side effect import statements into it for each module that occurs deeper in the module graph:
https://github.com/esm-dev/esm.sh/releases/tag/v127
This basically hoists the loading and avoids waterfall steps, but I much prefer the less invasive Link header with rel=modulepreload approach because it doesn't mutate the requested files. Import statements are not just preload hints the browser can choose to ignore though, so maybe that's why the CDN's like that approach. In our testing modern browsers respect Link header with rel=modulepreload on dynamically imported module responses.
@yaacovCR let's look at how realistic application code can result in flatter waterfall loading than you might expect, given the depths of some of the module graph branches involved. Let's use @benjie 's example:
GraphQLInt.mjs ---> GraphQLScalar.mjs ---> assertName.mjs ---> GraphQLError.mjs ---> printSourceLocation.mjs ---> getLocation.mjs ---> invariant.mjs
In a mutations/Foo.mjs resolver module for a mutation, you might have:
import { GraphQLInt } from "graphql";
import GraphQLUpload from "graphql-upload/GraphQLUpload.mjs";
In GraphQLUpload.mjs, GraphQLError is imported:
https://github.com/jaydenseric/graphql-upload/blob/421707f3b4e2b0c18ed9beec8eeaf3a0c942841d/GraphQLUpload.mjs#L3
So, in the module graph for mutations/Foo.mjs, the entire branch GraphQLError.mjs ---> printSourceLocation.mjs ---> getLocation.mjs ---> invariant.mjs get's hoisted up from loading at a position 4 levels deep, to just 2 deep.
If invariant.mjs is imported by any other part of the GraphQL.js APIs being used at a higher module graph depth, then you won't experience the worst case scenario of 7 deep.
What in particular would you suggest in terms of the import hierarchy for perhaps one of the files above?
Say for example, the first item in your list could be shortened:
{
"from": "utilities\\replaceVariables.mjs",
"to": "jsutils\\invariant.mjs",
"depth": 5,
"path": [
"utilities\\replaceVariables.mjs",
"utilities\\valueToLiteral.mjs",
"type\\definition.mjs",
"jsutils\\didYouMean.mjs",
"jsutils\\formatList.mjs",
- "jsutils\\invariant.mjs"
]
}
By not importing and using a trivial invariant utility function:
https://github.com/graphql/graphql-js/blob/5bdc53c406cfd6fdcbbef446a4ea4aed325d489c/src/jsutils/formatList.ts#L18
function formatList(conjunction: string, items: ReadonlyArray<string>): string {
- invariant(items.length !== 0);
+ if (!items.length) throw new TypeError("Can’t format an empty list.");
This code change reduces the module graph depth, has a more useful error message, and allows throwing a more correct class of error (TypeError or RangeError).
Once reducing module graph complexity is something maintainers keep in mind as a goal, you notice little things like this that add up and are pretty easy to avoid.
@phryneas
Both of these extremes sound like bad chunking - I will acknowlege that these extremes exist and that probably very few applications out there arrive at "perfect" chunking, but this overplays it - we're probably off by a few percent in most cases.
"Perfect" chunking with bundling is conceptually impossible, as I already explained. If you chunk everything perfectly, you arrive at optimal modules. One module per thing, downloaded only when it's used. As for "we're probably off by a few percent in most cases", I already showed you a screenshot of the coverage report of unused code when landing on Apollo's homepage. Nearly 60%; 3.4 MB, is unused. That's not "off by a few percent". Take a look at any other website/app that bundles, and you will see typically half the code is unused. Look at the official Next.js website lol:
It's unhelpful to characterise me bringing up the widely acknowledged problem in our industry of gigabytes of node_modules bloating our disks as being "hyperbolic". Hundreds of gigabytes of text files bloating millions of developers local hard drives, most unused by our applications and tooling. It wastes battery life as operating systems index it and backups try to sync them all. Think of the billions of text files installed redundantly in CI environments every single day. Engineers should care about waste like this; it can be easily avoided.
This statement seems very over the top. If this were the case, in the last 10 years we would have seen webpages arrive in the gigabytes where in practice we're still having the same discussion as back then over, in bad cases, a few megabytes.
It is a well studied and discussed phenomenon in our industry that we bloat our code just to the brink of things being unusable by end users, no matter how much hardware improves.
- https://stackoverflow.blog/2023/12/25/is-software-getting-worse/
- https://spectrum.ieee.org/lean-software-development
A lot of your other queries should be cleared up by the deeper explanations above.
Regarding:
That said, graphql-js has two different distribution channels: npm and deno. As I already said, there might very well be a case to handle these two differently.
Fragmenting the GraphQL ecosystem is not desirable or necessary. We need universal, web standard modules that work efficiently in Node.js, Deno, and browsers alike. As has been explained multiple times, optimal modules benefits all kinds of projects.
We are just about to release Apollo Client 4.0, which is the first major version in 5 years.
I'm well aware that Apollo has an extreme aversion to publishing SemVer major releases; half a decade of no major release is not reasonable for such an evolving project. There have actually been breaking changes published, just not correctly with SemVer major version bumps. My apps broke multiple times when I used to update Apollo Server and Client dependencies.
Corporate preferences to maintain a status quo shouldn't dominate engineering decisions in an open source community project like GraphQL. Yes, publishing web standard pure ESM modules is a breaking change, but it's inevitable and every package in the npm ecosystem will tackle it at some point. Even Apollo Server/Client. I migrated all my packages to pure ESM years ago and haven't looked back. This is a once in a generation effort; once we are working with web standard ESM that's it. GraphQL shouldn't wait to be last in this regard, because the GraphQL ecosystem needs it to go first before they can build on top of it.
I have to admit that the name "optimal module design" is really making me nervous
I have been optimizing JavaScript codebases (startups through to enterprises) professionally for many years now, and have taken build times from minutes to seconds, personally eliminated hundreds of thousands of lines of code in single projects, and removed hundreds of megabytes of dependencies at a time. Optimal module design is a key ingredient; in particular deep importing from dependencies and avoiding barrel files. Your personal suspicion of the choice of the word "optimal" is just a feeling – an instinct. I'm sure if you spent years figuring out how to build apps without build steps, using native ESM over HTTP you would come around to describing it as optimal too.
We should dispel the myth that "unbundled ESM over HTTP will not become mainstream in the browser anytime soon". Rails has supported import maps and bundle-less environments in production for near 4 years now, and quite large production apps have been stable and performant, some deep dives below:
https://world.hey.com/dhh/modern-web-apps-without-javascript-bundling-or-transpiling-a20f2755 https://world.hey.com/dhh/hey-is-running-its-javascript-off-import-maps-2abcf203 https://world.hey.com/dhh/once-1-is-entirely-nobuild-for-the-front-end-ce56f6d7
Tree-shaking, bundling, etc isn't a silver bullet, more a coping strategy needed from the years/decade of disparate modules systems, poor code quality standards, and large frameworks pushing the need for all of this tooling more and more. The layers of cope have become so thick that frameworks have had to start targeting poorly structured libraries manually. (Worth noting that even Vercel uses the term "optimize" here.)
There's a great opportunity to push JS forward here and lead the ecosystem by example.
Back on the rollout plan. I like @benjie’s suggestion made here: https://github.com/graphql/graphql-js-wg/issues/161.
On the "no index file" front, I think we should reserve that for the next server major. In the release we introduce the subpaths, I'd re-export everything from the root barrel file but mark it all as deprecated (e.g. via tsdoc) to encourage users to move to the new way. Sometimes it takes a while to fully update a codebase to use the new patterns, so giving users a period when both the root and the subpaths work is going to give them a window to adopt this practice gracefully before we force it upon them in the next semver major. It also means that we can give users an entire release cycle (e.g. v18 -> v19) to prepare for the breaking change coming down the line.
To ease adoption, deep imports could be introduced as part of v17, while retaining (and marking as deprecated) the current export structure for backwards compatibility. Following up in v18 the old exports are removed.
Hundreds of gigabytes of text files bloating millions of developers local hard drives, most unused by our applications and tooling. It wastes battery life as operating systems index it and backups try to sync them all.
That is literally the type of argument I am calling out as "hyperbolic", though.
Yes, there are bubbles of developers who care about this, but there are also bubbles who don't, and calling this a "widely acknowledged problem in our industry" will only prompt me to point out that there is an equal amount of people who "widely acknowledge" that for them this is not a problem at all (for me, this is less than 10% of my disk space and a work computer, what else am I going to use the space for?) who just don't want to be part of this whole discussion because it doesn't matter for them and discussions like this are stressful.
The only difference here is probably visibility - people who are happy with the status quo don't write blog posts about something that is the default and works for them, while people who are unhappy with it do.
Please let's avoid these absolute statements, they lead nowhere.
Back to my argument about migration times: The time frames wouldn't change a lot, even if Apollo Client were to release new majors daily. The ecosystem moves slowly - again, only 63.8% of downloads are even at v16 and only 34% are on 16.10 or 16.11 - the only minors release within the last year.
If this is released as a "all or nothing" switch, we are still confronted with the choice of switching over to the new version and leaving a large amount of our users that are stuck on an older version of graphql behind, or just not adopting the new graphql version and slowing down the ecosystem migration even more by that move.
Right now I can't tell at which point we would feel comfortable to make that switch over in the case of a breaking change, but I suspect that it would require similar adoption numbers as graphql 16 has right now (we are about to drop v15) - and it took almost 4 years to get there.
To ease adoption, deep imports could be introduced as part of v17, while retaining (and marking as deprecated) the current export structure for backwards compatibility. Following up in v18 the old exports are removed.
@dburles from what @jaydenseric was writing up there, I assumed that this would be something that he would not accept under any circumstances - to me this whole proposal seems to be very all-or-nothing without any room for compromise:
If users should never ever import from the index modules, instead of fighting a losing educational battle, just don't publish the index modules! The entire GraphQL JS ecosystem will fall effortlessly into the pit of success. GraphQL.js "glues" the ecosystem together, so it's important to get this right in the graphql package.
As for default exports, one thing that I noticed yesterday is that the Deferring Module Evaluation proposal seems to be something that should go hand-in-hand with the optimizations you're going for here - but it feels very fiddly if you want to use it with default exports (you'd be calling something.default instead of importName.something all over the code).
Are you sure that the language designers are not going into the opposite direction of what you are proposing here?