gosling.js icon indicating copy to clipboard operation
gosling.js copied to clipboard

Improve build + packaging; make higlass external?

Open manzt opened this issue 3 years ago • 3 comments

Motivation

Currently the npm distribution for gosling.js includes:

dist/
├── 1.gosling.js
├── 1.gosling.js.map
├── 2.gosling.js
├── 2.gosling.js.map
├── 3.gosling.js
├── 3.gosling.js.map
├── 4.gosling.js
├── 4.gosling.js.map
├── gosling.js
├── gosling.js.map
└── worker.js

All of which are minified, UMD-like exports. It seems {1,2,3,4}.gosling.js are chunks for dynamic imports that are aliased by __webpack_require__ in dist/gosling.js, and dist/worker.js is a separate build entirely that ends up being referenced in the source code via raw-loader!../../dist/worker.js.

Ideally, an npm package should provide an ES module entrypoint (which is easier to statically analyze and treeshake) rather than a pre-built & minified UMD. In my experience, webpack isn't a great choice for a library because you can't target ESM as an output and the generated bundles are difficult to load with by bundlers (e.g. __wekpack_require__ statements cannot be resolved, so any project using gosling.js as a dependency will likely be unable to load {1,2,3,4}.gosling.js).

I have experimented with a new build for gosling.js on my vacation and wanted to share some ideas.

Approach

  • Bundle gosling.js the library (src/index.ts) with esbuild or rollup and target esm output (as well as UMD bundle like before for use in script tags)

  • Consume the library gosling.js in the editor (rather than importing from src/). Since the editor is really an "app" and not distributed in the npm module, it would make sense to configure this separately from the library. This is similar to what we do in Viv, where src/ corresponds to the library and avivator/ is an app that consumes src/ as a module. This enforces that the editor only use the public API from gosling.js.

src/ # bundled with rollup/esbuild
editor/ # built with webpack, imports generated bundle
  • Make higlass an external dependency. Since the CSS for higlass is already required via a script tag, I think it would also make sense to also externalize higlass. I'm not sure this is something that has been discussed, but since higlass is distributed as a UMD bundle, we could similarly add it as script tag along side react, react-dom, and pixi.js for use with our UMD bundle.

Vite might be an option to unify the two first points (which is what we have done in Viv). One thing I've noticed is that @gmod/* modules are very hard to bundle outside of webpack <4 ecosystem (lots of node-builtins with conditional runtime checks). It would be nice to bundle just these difficult to bundle dependencies for our ESM module, and then leave everything else external for bundlers.

manzt avatar Sep 02 '21 21:09 manzt

@manzt thank you so much for these insights!! I wanted to reorganize the bundling part and separate the editor from gosling.js at some point, and this looks to be great timing.

I agree with all your points. Since I don't have much experience with these, I have some follow-up questions.

  • With the proposed changes, I guess we can still test the local changes of gosling.js with the editor (e.g., bundling the updated gosling.js before running the editor)?
  • We haven't yet deeply discussed whether we want to externalize higlass or not. Having the CSS file required, I think it makes sense to externalize higlass, although I was also thinking of removing the CSS dependencies at some point. Honestly, I was not sure about the rationale of how people determine which packages to externalize and which ones to include (e.g., react, react-dom, pixi.js). Would the size of the resulting bundle be the main reason?
  • To make ESM modules for @gmod/*, doing it in a forked repo would be the most ideal way?

sehilyi avatar Sep 03 '21 14:09 sehilyi

With the proposed changes, I guess we can still test the local changes of gosling.js with the editor (e.g., bundling the updated gosling.js before running the editor)?

This is the trickiest part. Something like Vite might offer a way to unify the entire process, but I've run into some headaches trying to get that to work unfortunately (very slow startup/build time). My idea (and experiment) is as follows:

  • bundle src/index.ts with esbuild and generate ESM output dist/gosling.mjs. Rollup is also an option but esbuild is sooo much faster, so if we can get that to work I think it would be a huge bonus during development.

  • in the editor webpack config, set an alias in the webpack config:

// webpack.config.js
  resolve: {
    alias: { 'gosling.js': path.resolve(__dirname, 'dist/gosling.mjs' }
  }
// editor/index.js
import { ... } from 'gosling.js'
  • during production, we then need to run:
node build.js && webpack --mode production # build js output _once_ along with site
  • during development, we can use a tool like concurrently to run both processes in unsion:
concurrently 'node build.js --watch' 'webpack-dev-server --mode development'

Any changes to src/ will trigger is a rebuild, which will also trigger an update via the dev server.


Honestly, I was not sure about the rationale of how people determine which packages to externalize and which ones to include (e.g., react, react-dom, pixi.js). Would the size of the resulting bundle be the main reason?

Good question. I don't know if there is a "right" answer because it depends on the target use and format:

UMD

When publishing a UMD format (imported via script tag) you must create a bundle with everything except for what you have marked external. This is because the UMD scripts for other libraries (e.g. react, react-dom, pixi.js, and higlass) add global namespaces to the window on import so other UMD modules depend on (e.g. React, ReactDOM, PIXI, hglib). The rest of your bundle needs to include the package dependencies because they aren't added as script tags.

Therefore, a UMD script can be used as long as it's external modules are also on the page. You could in theory externalize everything as long as you had a script tag for each dependency; however not every npm package publishes a UMD version, and adding all those script tags would be a nightmare for an end-user.

All that said, the benefit of making a module external is that a website doesn't need to download multiple copies of the same code, and often to avoid conflicting dependencies. My rule of thumb for UMD is to make the largest, stable dependencies external. These are often peerDependencies for a project as well, which we might consider for higlass.

        <script crossorigin type="text/javascript" src="https://unpkg.com/react@16/umd/react.development.js"></script>
        <script crossorigin type="text/javascript" src="https://unpkg.com/react-dom@16/umd/react-dom.development.js"></script>
        <script crossorigin type="text/javascript" src="https://unpkg.com/pixi.js@5/dist/pixi.js"></script>
        <script crossorigin type="text/javascript" src="https://unpkg.com/[email protected]/dist/react-bootstrap.js"></script>
        <script crossorigin type="text/javascript" src="https://unpkg.com/[email protected]/dist/hglib.min.js"></script>

It's just one extra script tag for our examples.

ESM

For an ESM, my advice is to make as much as possible "external". The rationale here is that "target" for this bundle is other applications using gosling.js as a dependency -- not to be used directly in the browser. In my opinion, this target should optimally just be the combined gosling source. This way a bundler can resolve any shared dependencies and drastically reduce the bundle size:

import * as gosling from 'gosling.js'; // hglib included in gosling bundle
import * as hglib from 'higlass'; // hglib loaded again
import { ... } from 'd3-array';

For example, I know that higlass is already an optional dependency of vitessce, so implementing https://github.com/vitessce/vitessce/issues/955 would currently entail loading higlass twice if a higlass component and gosling component were on the same page.

My final comment (and opinion) is that the goal of the "module" bundle should be to remove as much of the build-complexity as possible. Bundling an application using gosling.js e.g.,

import * as gosling from 'gosling.js';
console.log(gosling);

should require almost no bundler configuration, but get all the benefits of treeshaking/shared dependency resolution etc. This means we should compile typescript, remove any weird import statements (e.g. raw-loader!). Ideally all gosling dependencies should similarly work with nearly zero configuration, but some are difficult to bundle for the browser as mentioned. In this case, I would suggest we do others the favor of including these "difficult" modules in our bundle so that others don't need to go through the work of doing so.


To make ESM modules for @gmod/*, doing it in a forked repo would be the most ideal way?

I would avoid living on a fork. This has been a pain for us with geotiff in Viv. My suggestion above is that we bundle these "difficult" dependencies ourselves. By including those modules in our "module", we effectively include a fork in our published package for others. There are several options here:

  • Create custom resolution in our bundler configuration to deal with these modules.
  • Use something like patch-package to modify the node_modules during development.

manzt avatar Sep 03 '21 16:09 manzt

These make a lot of sense to me, and thank you for these thoughtful comments again!

I think, after the deadline for the VIS presentation recording which is due Sep 12, I can start working on this.

sehilyi avatar Sep 03 '21 16:09 sehilyi