node icon indicating copy to clipboard operation
node copied to clipboard

Use only one copy of llhttp ?

Open kapouer opened this issue 1 year ago • 16 comments

What is the problem this feature will solve?

Versions of Node using Undici (16, 18) are actually using two different copies of llhttp

What is the feature you are proposing to solve the problem?

Would it make sense to use llhttp wasm that is bundled in Undici, instead of recompiling it ?

What alternatives have you considered?

n/a it's a refactoring.

kapouer avatar Jul 26 '22 15:07 kapouer

Or could the bundled Undici use the native one? @nodejs/undici

aduh95 avatar Jul 26 '22 17:07 aduh95

The architecture and integration of the two is radically different.

When switching to the wasm one we actually got an improvement in performance.

I think it might be possible to reach a stage where this is possible, but it's probably far from now given we are focusing on fetch. However, things change if people would like to work on this.

mcollina avatar Jul 26 '22 21:07 mcollina

@mcollina This is a problem for us as distro package maintainers as well. Packaging precompiled code is a no-no for us. Is there a way this could be compiled from source during build? How complicated would it be to use non-wasm version given we are okay with worse performance? Or would it be plausible to have an option to have node without undici? IIRC it's used only for fetch which is experimental for now.

@khardix @mhdawson @sgallagher

kasicka avatar Aug 02 '22 14:08 kasicka

@kasicka you should have a look at how debian is dealing with it. node-undici package is building the wasm files from source, there, and nodejs package embeds the result.

kapouer avatar Aug 02 '22 14:08 kapouer

@nodejs/tsc we have a decision to make here. One of the reasons we stopped using the native http_parser was due to the use of process.bindings() for retrieving it (https://github.com/nodejs/undici/issues/22#issuecomment-650584032) Assuming we don't want to bring process.binding() usage back, we'd need to create some custom wrapping code for undici that replaces that part. Are we all onboard?

The second question is who is willing to do the work. Are there any volunteers?

mcollina avatar Aug 02 '22 16:08 mcollina

I have been thinking about how we might move towards a standardized approach for integrating WASM which would address some of the concerns on the distro side. Some aspects of that would include:

  1. Making it easy to identify/find all WASM blobs
  2. Having good process/es/documentation for how the WASM blobs are generated in a repeatable way. I think this is also good for the project ignoring the distros in terms of supply chain security.
  3. Making it easy for a downstream project to follow 2) and then replace the blogs from 1) with versions that they regenerate.

@kapouer when you say the nodejs package embeds the result, is it replacing what is in the Node.js upstream by default or some other process?

mhdawson avatar Aug 02 '22 19:08 mhdawson

FWIW building the wasm needed by undici is completely automated (otherwise is very hard to do it by hand).

mcollina avatar Aug 02 '22 20:08 mcollina

@mhdawson currently the nodejs debian package replaces deps/undici/undici.js during build by the one built by the node-undici debian package using a slightly patched version of undici's build/wasm.js. However, that's suboptimal for us, since rebuilding nodejs each time undici is updated in debian is not a good idea (especially for security fixes). It is not a big problem though, as it's rather easy to patch nodejs so that it requires the external undici-fetch.js file instead of bundling it.

kapouer avatar Aug 02 '22 21:08 kapouer

@kapouer thanks for the details.

mhdawson avatar Aug 03 '22 19:08 mhdawson

as it's rather easy to patch nodejs so that it requires the external undici-fetch.js file instead of bundling it.

I'm thinking that as there are more instances like this, it may become more of an overhead. I assume having a configure time option would make it a bit easier.

Thinking out loud a standard approach might be something like:

  1. Any WASM/generated JavaScript that is not created directly as part of the Node.js build should be retrieved through a common accessor method.
  2. For community builds the WASM/generated JavaScript that is required will be bundled into Node.js. We could probably come up with some infra where adding another instance would be as easy as, point to file in deps, give it a name, its bundled in and accessible through the name given using the accessor method.
  3. In each case for 1) there should be a configure time option which allows the bundled require to be replaced with a require for an external file and for the path(s) for where to look for those files. The configuration time options might even be able to be autogenerated based on the name used in 2)

If the same approach was used in undichi (or at least the same configuration option name) for example, then the option could be simply passed on and the same external file could be used by Node.js core and undichi (although we would still have 2 copies)

I'm thinking of this kind of the along the same lines as how we have configure time options to allow components like openssl, cares, etc. to either be bundled in to the node.js binary or linked against as a shared library.

We might even be able to use the same approach as for shared libraries by getting the WASM/generated JavaScript from a symbol that is either present in the binary itself or within a shared library but that does seem a bit like putting a square peg in a round hole.

mhdawson avatar Aug 03 '22 20:08 mhdawson

Why is it necessary to use external versions? Could Debian’s patch be integrated into Uncici’s version somehow?

We include a wasm binary for the https://github.com/nodejs/cjs-module-lexer dependency that’s also bundled in Node. I thought that this binary was good to run anywhere without needing recompilation. It feels like this is the model we should be shooting for, that Node’s binaries/scripts can run anywhere, rather than allowing more ways to run custom versions.

GeoffreyBooth avatar Aug 03 '22 21:08 GeoffreyBooth

@GeoffreyBooth there two use cases (that ideally are selected by ./configure flags, or worse, by local patches):

  1. Node authors distribute a binary bundling many other libraries as much as possible,
  2. Distributions redistribute Node linked with shared (system-installed) libraries as much as possible.

Whatever the module is (cjs-module-lexer, undici, acorn, openssl, c-ares, uv, etc...) if it needs to be compiled to be used, then it can be treated like a library with respect to those two cases.

In any case, open-source distributions need to recompile each part at some point, during Node build, or during the externalized dependency build. Also case 2 hates bundling an external dependency that is built and system-installed - security fixes then require a rebuild that could have been avoided by loading dynamically (like shared libraries do).

kapouer avatar Aug 03 '22 22:08 kapouer

It feels like this is the model we should be shooting for, that Node’s binaries/scripts can run anywhere, rather than allowing more ways to run custom versions.

That is commendable, as long as you trust the person making the binaries. In Fedora (and derivatives), there is a strict policy of not shipping pre-built anything. One of the reasons why we do this is that there is no guarantee that the compiled binary corresponds to the source in git, and does not include i.e. added crypto-miner or other kind of supply chain attack. Rather that take that risk, we want to rebuild everything from source that we can review, using tools we also built ourselves (so no externally prepared SDKs or similar shortcuts either).

That's at least the theory; in practice, corners are sometimes cut and exceptions granted, but we would very much have as little of these as possible.

khardix avatar Aug 04 '22 09:08 khardix

I second @khardix : it's "open-source distribution" as in: everything is built from open source code.

kapouer avatar Aug 04 '22 09:08 kapouer

@kapouer, @khardix, @kasicka as consumers of what I've proposed in https://github.com/nodejs/node/issues/44000#issuecomment-1204434268 what are your thoughts?

mhdawson avatar Aug 04 '22 20:08 mhdawson

Perfect !

kapouer avatar Aug 05 '22 06:08 kapouer

Confirmed with @kasicka and @khardix that the proposal above is reasonable from their perspective as well. Will look at experiementing to see how it might be implemented.

mhdawson avatar Aug 17 '22 14:08 mhdawson

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions[bot] avatar Feb 14 '23 01:02 github-actions[bot]

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions[bot] avatar Aug 14 '23 01:08 github-actions[bot]

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions[bot] avatar Sep 13 '23 01:09 github-actions[bot]