unenv icon indicating copy to clipboard operation
unenv copied to clipboard

Use `process.getBuiltinModule` for hybrid node modules

Open pi0 opened this issue 1 year ago • 6 comments

Following up on an idea during the discussion with @IgorMinar and @petebacondarwin.

Currently we have $cloudflare specific entries that allow hybrid support for runtime compatible features using process.getBuiltinModule(id). (example: node:buffer)

Since process.getBuiltinModule is a Node.js standard, we might instead just merge entries. We can still treeshake (each export prefers built-in module if available and fallback to polyfill).

Cons:

  • Unenv fallbacks are always bundled (although can be lazy evaluated and are small)
  • Less control over runtime (runtime can enable natives in the future that might be incompatible with polyfill, however unlikely)

Pros:

  • We don't need to maintain a list of supported Node.js APIs (workerd#2097)
  • Universal support for other runtimes to also support
  • Future compatible. If runtime supports a new built-in natively, unenv code adopts it. Being adaptable, means also unenv can smartly change behavior based on compat date and flags without need to know info on build time

pi0 avatar Aug 20 '24 18:08 pi0

I really like the desire not to rely on https://github.com/cloudflare/workerd/issues/2097 and instead create the hybrid polyfills on the fly, but what I'm concerned about and I don't quite understand is the claim that disabling tree-shaking doesn't matter much, or at least that's how I read "Unenv fallbacks are always bundled (although can be lazy evaluated and are small)"

@pi0 can you please help me understand how this would not matter? some of the unenv polyfills are not small (e.g. node streams, url, etc). So how can we ignore the significant size of these?

IgorMinar avatar Sep 13 '24 05:09 IgorMinar

You are right @IgorMinar we will bundle polyfills that's the inevitable cost if we do this for other benefits (still bundlers can tree-shake per export - not per module that was my side-point).

IIRC the initiatives behind the idea were mentioned when we were talking about the possibility of shipping unenv dist via workers module registry directly instead of bundler.

Out of curiosity, I did a POC to see how much is the cost of entire unenv node polyfills bundled. It is ~191.6kb (!). Even without module registry and even without making an optimized bundle that takes into account workerd native node support which makes that number much less (we can switch the entry) I think the ratio of this number compared to the whole next.js server (or any deployment that needs node compat) bundle is fairly good. As a side-note, it will also solve process.getBuiltinModule to actually work in runtime (instead of returning workerd native subsets, it actually will work).

pi0 avatar Sep 13 '24 07:09 pi0

some in progress notes... still wip

Pros:

  • no config or static list of what workerd supports in unenv == less maintenance
  • no metadata file in workerd needed https://github.com/cloudflare/workerd/issues/2097

Cons:

  • some exports might be only partially implemented, and we'd need to iterate not only over module exports, but also over the symbol properties (e.g. globalThis.process) which can become error prone or expensive
  • if workerd has a problematic native implementation, we won't be able to override it in unenv (shouldn't happen, so maybe not a problem?)

IgorMinar avatar Sep 16 '24 15:09 IgorMinar

If workerd has a problematic native implementation, we won't be able to override it in unenv (shouldn't happen, so maybe not a problem?)

If workerd have a known issue, unenv polyfills can explicitly avoid using it (per export) and only use polyfill. (unenv has control per build/deployment)

If workerd has a problematic native impl for any reason or drops a feature in favor of polyfill (like https://github.com/unjs/unenv/pull/302 i guess?) or when it enables it back, this approach actually self-adopts without need to rebuild/deploy. (workerd has control at runtime)

some exports might be only partially implemented, and we'd need to iterate not only over module exports, but also over the symbol properties (e.g. globalThis.process) which can become error prone or expensive

You are right about the cost of fallback check. But it can be as low as a one time nullish check per export or symbol|prop i guess right? (globalThis.process.*, is polyfilled once)

pi0 avatar Sep 16 '24 15:09 pi0

@pi0 the notes above were from my brainstorming session with @petebacondarwin which we had to wrap up quickly so I just posted the WIP notes we captured before we finished.

I see a lot of value in this but I'm also a bit uneasy because of the potential cost of this approach, and hard to predict future interactions between workerd and unenv, which evolve and are updated differently:

  • workerd keeps on evolving and we update it's version in prod daily/weekly
    • it's a semi unpinned dependency in prod
    • a semi because compat_date does pin the behavior but not the implementation somewhat
  • the unenv layer remains pinned and is versioned along with the deployed applications
    • is baked into the deployed application bundle
    • unenv version changes only if the developer uses a different wrangler version (which brings in different unenv version), and redeploys the application using this different version of wrangler

In an ideal world, the version skew issue should not be a problem, it's just very hard to predict all of the possible interactions and corner-cases given the magnitude of the surface area, and the fact that workerd, node.js, and unenv are continuously changing and evolving.

So maybe it makes sense to assume the ideal world conditions and consider version skew not to be a problem, and explore if performance cost would rule out this approach even before we need to worry about version skew.

And performance comes to two things:

  1. how expensive (in terms of storage) would it be to include the entirety of unenv runtime in each worker bundle
    • we should keep in mind that unenv continue to grow, and especially if we start porting larger chunks of node.js runtime to unenv, it could grow significantly)
  2. how expensive (in terms of CPU time) would it be to perform a granular workerd API surface check and assemble hybrid polyfill layer on the fly
    • unlike with size, it's unlikely that we need to worry too much about the CPU time increasing over time, as it's unlikely that the node.js API surface area would double or triple any time soon (even when considering new additions like sqlite api, etc)

Both of these could be explored via experimentation in real world conditions, and we could help explore these some time over the next few weeks (just not in September). Thoughts?

IgorMinar avatar Sep 17 '24 17:09 IgorMinar

Thanks for sharing your thoughts @IgorMinar.

Let's stay with the current approach (unenv-bundled-by-usage) for wrangler tooling, considering the current system works for you and the announced date is close, we can keep manual sync between workerd<>unenv<>wrangler versions 👍🏼


Considering past experiences of build-time-only polyfills we had with Nitro/Nuxt , i am really positive that increased [runtime] coverage and stability of this alternative approach worth experimenting.

I will put a pending label on this issue until we setup a workerd testing system in unenv repo so we can properly measure any possible overhead and test functionality in actual runtime directly.

pi0 avatar Sep 18 '24 09:09 pi0