node icon indicating copy to clipboard operation
node copied to clipboard

src: add support for externally shared js builtins

Open mhdawson opened this issue 1 year ago • 28 comments

Refs: https://github.com/nodejs/node/issues/44000

  • add infra to support externally shared js builtins in support of distos that want to externalize deps that include JS/WASM instead of native code
  • add support for externalizing
    • cjs_module_lexer/lexer
    • cjs_module_lexer/dist/lexer
    • undici/undici

Signed-off-by: Michael Dawson [email protected]

mhdawson avatar Aug 24 '22 15:08 mhdawson

Review requested:

  • [ ] @nodejs/gyp
  • [ ] @nodejs/startup

nodejs-github-bot avatar Aug 24 '22 15:08 nodejs-github-bot

WIP in still needs docs, tests, runing linters as well as addressing some existing test failures.

mhdawson avatar Aug 24 '22 15:08 mhdawson

In the default configure case (ie no builtins are externalized) all tests pass

When I externalize cjs-module-lexer/lexer.js with:

./configure --shared-builtin-cjs_module_lexer/lexer-path=/home/midawson/newpull/io.js/stuff/deps/cjs-module-lexer/lexer.js

All tests pass except for 1 as follows:

=== release test-code-cache ===                                               
Path: parallel/test-code-cache
node:internal/modules/cjs/loader:797
      throw new ERR_UNKNOWN_BUILTIN_MODULE(request);
      ^

Error [ERR_UNKNOWN_BUILTIN_MODULE]: No such built-in module: node:internal/deps/cjs-module-lexer/lexer
    at new NodeError (node:internal/errors:393:5)
    at Module._load (node:internal/modules/cjs/loader:797:13)
    at Module.require (node:internal/modules/cjs/loader:1021:19)
    at require (node:internal/modules/cjs/helpers:103:18)
    at Object.<anonymous> (/home/midawson/newpull/io.js/test/parallel/test-code-cache.js:19:3)
    at Module._compile (node:internal/modules/cjs/loader:1119:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1173:10)
    at Module.load (node:internal/modules/cjs/loader:997:32)
    at Module._load (node:internal/modules/cjs/loader:838:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12) {
  code: 'ERR_UNKNOWN_BUILTIN_MODULE'

That is what I'm currently looking at.

mhdawson avatar Aug 24 '22 15:08 mhdawson

Looking at the test failure seem like I need to do more than just read the file from disk every time it is loaded. Working on that now.

mhdawson avatar Aug 24 '22 16:08 mhdawson

@mhdawson for us not intimately familiar with all Node.js details, would you mind to improve the commit message to include also what it is really good for and how to use it? Thx a lot :pray:

voxik avatar Aug 24 '22 19:08 voxik

@voxik will be adding docs, just have not gotten to that yet.

Pushed an update to move addition of externalized builtins to the right place to get them fully initialized as part of the builtin loader, but still have some error handling and memory management to figure out. Once I do that will work on the docs/tests/cleanup etc.

mhdawson avatar Aug 24 '22 23:08 mhdawson

@voxik, @kapouer, @khardix, @kasicka I've add the doc in BUILDING.md in terms of how to use the new options. Let me know if you have questions/it's not clear. The doc in maintining_dependencies.md also has some text that you 4 reviewing (for example why distros used externalized dependencies etc).

The new options as seen from the help output (./configure --help) are:

Shared builtins:
  Flags that allows you to control whether you want to build against
  internal builtins or shared files.

  --shared-builtin-cjs_module_lexer/lexer-path NODE_SHARED_BUILTIN_CJS_MODULE_LEXER_LEXER_PATH
                        Path to shared file for cjs_module_lexer/lexer
                        builtin. Will be used instead of bundled version at
                        runtime
  --shared-builtin-cjs_module_lexer/dist/lexer-path NODE_SHARED_BUILTIN_CJS_MODULE_LEXER_DIST_LEXER_PATH
                        Path to shared file for cjs_module_lexer/dist/lexer
                        builtin. Will be used instead of bundled version at
                        runtime
  --shared-builtin-undici/undici-path NODE_SHARED_BUILTIN_UNDICI_UNDICI_PATH
                        Path to shared file for undici/undici builtin. Will be
                        used instead of bundled version at runtime

The will let you build with the lexer files and the undici files externalized so that they are loaded at runtime. The issue in https://github.com/nodejs/node/issues/44000 of trying to use one copy of llhttp is a bit different and not covered in this PR, but this PR does cover some of the discussion we had in 44000.

mhdawson avatar Aug 26 '22 16:08 mhdawson

Squashed to 1 commit and marking as non-draft as I think it's complete and want to get testing on more platforms.

In terms of new tests I don't see us having tests for configure options, other than additional CI run for some specific subsets. Not sure that we should add one for this subset, but if so it would be after the PR lands anyway.

I've confirmed all tests run/pass with no configure options, the --without-intl option, with all three of the sharable builtins externalized, and with just one of the sharable builtins externalized.

mhdawson avatar Aug 26 '22 16:08 mhdawson

CI: https://ci.nodejs.org/job/node-test-pull-request/46261/

nodejs-github-bot avatar Aug 26 '22 20:08 nodejs-github-bot

Is there a reason for excluding acorn/acorn-walk from this ?

kapouer avatar Aug 28 '22 14:08 kapouer

@mdawson I like the write-up in the maintaining_dependencies.md file. It does a nice job of explaining the generic problems we (distro maintainers) have. It may benefit from some rewording, as it is a bit dense; but I'm not a technical writer either, so it looks good enough for me.

khardix avatar Aug 29 '22 14:08 khardix

Is there a reason for excluding acorn/acorn-walk from this ?

From a very quick look, it looks to me like the contents of the files in deps/acron/acorn-walk are in the preferred form. Most of the conern I heard was related to those that included WASM blogs so was starting there.

mhdawson avatar Aug 30 '22 22:08 mhdawson

CI: https://ci.nodejs.org/job/node-test-pull-request/46391/

nodejs-github-bot avatar Sep 02 '22 17:09 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/46467/

nodejs-github-bot avatar Sep 07 '22 19:09 nodejs-github-bot

@kapouer can you confirm this is along the lines of what was needed?

mhdawson avatar Sep 07 '22 19:09 mhdawson

@mhdawson yes, seems very good, and I've added it to debian package of nodejs. Test results soon.

EDIT: ~~NB: Loading external dependencies at runtime also makes it easier for bootstrapping Node (by distributors) on new architectures.~~ scratch that, because AddExternalizedBuiltin is called at init. It would have given more leeway to call it in LoadBuiltinSource - missing files wouldn't crash node asap, and node can run many things without lexer or undici.

Anyway the intended usage works all right.

kapouer avatar Sep 07 '22 20:09 kapouer

I tried to lazy-load externalized deps with this approach: more_shareable_builtins.patch.txt (on top of this PR).

kapouer avatar Sep 08 '22 01:09 kapouer

missing files wouldn't crash node asap, and node can run many things without lexer or undici.

@kapouer my initial thought is that is not a good thing. It would better to know up fron that you don't have a fully working Node.js setup/application versus running up until the customer triggers a transaction that requires one of the missing pieces in the middle of the night.

mhdawson avatar Sep 09 '22 18:09 mhdawson

@mhdawson yes, late discovery of a missing file is really bad, but that won't happen because:

  • test suites: even distributions without CI will at least run the tests during build and many fail if some external module is missing
  • package dependencies: the nodejs package in the distribution declares its dependencies on the needed external modules, that is unlikely to break. Unless files are moved, in which case
  • CI of the distribution can spot regressions easily

kapouer avatar Sep 09 '22 19:09 kapouer

Also, sorry to insist, but a "moving file" is really unlikely to happen in the setup of a distribution. If that was a concern, everything would break everyday.

kapouer avatar Sep 09 '22 20:09 kapouer

I had started by adding the loading of files where you patch moves it to. At least one of the problems I saw is that it means that the files will be loaded multiple times (once for each of a few different loaders) versus being loaded into a cache once just like is done for the way they are loaded today.

I'm pretty comfortable that what I have in this PR is a reasonable way to externalize the files without changing anything else in terms of behaviour, runtime overhead etc. and think it would be good to stick with it for the first step.

mhdawson avatar Sep 12 '22 21:09 mhdawson

Yes, sorry for that, I was greedy :) This PR addresses the most problematic issue w.r.t. Free Software Guidelines: being able to recompile from source, with the provided source in a form that can be modified. Precompiled wasm blobs aren't, and the way to compile to wasm varies between platforms, so externalizing that part is the easiest path for now.

kapouer avatar Sep 13 '22 19:09 kapouer

I think we have agreement from the distros (@kapour from Debian and @khardix from RHEL) that this PR addresses the proposal we agreed in https://github.com/nodejs/node/issues/44000, all I need to do is find some core collaborators to review :)

mhdawson avatar Sep 13 '22 22:09 mhdawson

@targos would you be able to take another look?

mhdawson avatar Sep 13 '22 22:09 mhdawson

I'm going to review the implementation this week, but I'd like to mention as well that the Fedora Project would also like to have this, as we have official policies requiring that all code packaged in Fedora must be included in a manner that allows it to be patched if needed. The pre-compiled WASM prevents us from doing this. The primary reason for this is if we need to patch a major issue or CVE faster than upstream can get a release out.

sgallagher avatar Sep 14 '22 13:09 sgallagher

@addaleax any chance you could review?

mhdawson avatar Sep 19 '22 22:09 mhdawson

I guess i am a bit late to the party here... Why are these loaded at runtime rather than compile time?

devsnek avatar Sep 19 '22 22:09 devsnek

@devsnek along the same lines as with shared libraries it allows updates do be done independently from the Node.js binary. The best example is openssl which is often shared by a number of binaries. If there is a CVE, just the shared library for opensll needs an updated instead of all the runtimes that use openssl. This cuts down maintenance work for the maintainers.

More completed explanation is in doc/contributing/maintaining-dependencies.md within this PR.

mhdawson avatar Sep 20 '22 19:09 mhdawson

along the same lines as with shared libraries it allows updates do be done independently from the Node.js binary. The best example is openssl which is often shared by a number of binaries. If there is a CVE, just the shared library for opensll needs an updated instead of all the runtimes that use openssl. This cuts down maintenance work for the maintainers.

I think we can have more discussions on this. openssl is special in that it is a library that is pervasive (used by many other downstream software) and by virtue of its content, it is more subjected to out-of-line CVE activities.

Do the JS builtins fall in that category?

Providing flexibility in composing Node binary is good, but for this specific case, compile time flexibility is more desirable than runtime, to reduce runtime issues, and to align better with the single binary philosophy?

gireeshpunathil avatar Sep 26 '22 02:09 gireeshpunathil

@gireeshpunathil openssl is just one example. There is support for a number of shared libraries to be linked separately. I think most if not all of the dependencies:

You can see here all of the options to enabled using an external library for the shared libraries:

https://github.com/nodejs/node/blob/eb708c8ae51fe9b67c8ceabc0f1861ca6b7dfe51/configure.py#L233

mhdawson avatar Sep 26 '22 13:09 mhdawson