opentelemetry-js icon indicating copy to clipboard operation
opentelemetry-js copied to clipboard

Question: Plans for Node.js ESM Module Support?

Open astorm opened this issue 4 years ago • 86 comments

  • [X] This only affects the JavaScript OpenTelemetry library

Opening this issue based on a conversation with @dyladan and @obecny in order to start a wider discussion.

Has there been any talk about how the Node side of the project is going to address to coming ESM/ES6 module migration that (parts of) the node community is planning once Node 10 reaches end of life?

Specifically -- Node.js has added native support (no bundler needed) for the import statment and ESM/ES6 modules. These modules can't be monkey-patched in the same way that CommonJS can via require-in-the-middle. This means a lot of auto instrumentation is going to stop working as packages begin to adopt ESM modules and end users begin to use them.

Node's ESM modules have a loader API which might offer a way to monkey-patch modules in the same way, but I don't know if anyone's done any public research on that yet. (waves in the direction of @michaelgoin for reasons)

In addition to the question of instrumentation, there's the question of what sort of modules this project will publish to NPM (CommonJS, ESM, both) and whether the project's individual modules will ship with with type attribute set or not.

Finally, FWIW, these questions come out of a discussion with @delvedor and @mcollina about fastify's plans for ESM support, and that @mcollina pulled together a list of the relevant Node.js core working groups and issues for folks interested in getting involved on that side of things.

astorm avatar Feb 19 '21 21:02 astorm

The main problem with the loader API is that it is still experimental and it doesn't seem that this changes soon.

The current limitations are severe, e.g. only one loader per process is allowed. Monitoring tools like OTel should not eat the unique resource.

I think the Otel project can't do much about this as it has to rely on the infrastructure available. Unfortunatelly this is only monkey patching for now - which seems to be "broken" by ESM. I remember some node PRs to improve loader APIs but some of them abandoned because endless discussions. But I think this is more a topic for node repo this one.

In short I fear that APM tools (incl Otel) and ESM will not fall in love soon.

Flarna avatar Feb 19 '21 23:02 Flarna

Another option to avoid the need for monkey patching / loader hooks would be to avoid creating speparate instrumentation packages. Instead integrate OpenTelementry API directly into the frameworks in question. If they reject such a dependency at least hooks/events could be added to allow an instrumentation to get the needed data via a public, documented interface.

Flarna avatar Feb 22 '21 06:02 Flarna

Another option to avoid the need for monkey patching / loader hooks would be to avoid creating speparate instrumentation packages. Instead integrate OpenTelementry API directly into the frameworks in question. If they reject such a dependency at least hooks/events could be added to allow an instrumentation to get the needed data via a public, documented interface.

This is the best case for us, but it will most likely be a slow process. It is already beginning in some places though https://github.com/typeorm/typeorm/issues/7406

dyladan avatar Feb 22 '21 13:02 dyladan

FYI, Fastify offers integration points to gather all sort of information. There are a few modules that provide open telemetry integration:

https://www.npmjs.com/package/fastify-opentelemetry

https://www.npmjs.com/package/@autotelic/fastify-opentelemetry

mcollina avatar Feb 22 '21 13:02 mcollina

Another option to avoid the need for monkey patching / loader hooks would be to avoid creating speparate instrumentation packages. Instead integrate OpenTelementry API directly into the frameworks in question. If they reject such a dependency at least hooks/events could be added to allow an instrumentation to get the needed data via a public, documented interface.

This is the best case for us, but it will most likely be a slow process. It is already beginning in some places though typeorm/typeorm#7406

Yeah, I created that ticket. We are actively using TypeORM at work and trying to find out if TypeORM would be interested in me making a pull request to built-in OpenTelemetry support

weyert avatar Feb 22 '21 14:02 weyert

Think I got pinged cause I'd done some early experimental module loader investigation and had proven out some concepts.

I was also going to mention I thought an early goal of these efforts (even before the merge into Open Telemetry) was to encourage framework authors to integrate Open Telemetry, given being a vendor-agnostic and hopefully future go-to standard. So perhaps focusing there is a better use of time, as others have beat me to. (Along w/ the integration points approach, although that then requires further instrumentation writing).

If auto-instrumentation was going to be a long-term goal, then I do think a project like this is perfect for helping Node evolve its story for providing good support for instrumenting ESM. Really, all APM/Observability vendors for their various needs but certainly this as an open standard.

michaelgoin avatar Feb 22 '21 19:02 michaelgoin

Thanks for the background and the thoughts @Flarna -- they're appreciated and useful. I agree that an optimal end goal for the project is to have enough influence and sway that library and framework authors welcome adding instrumentation directly to their libraries and frameworks. I also agree that ESM loaders aren't quite there yet and aren't something oTel can do by itself with the current ESM implementation in Node.js.

That said -- right now there aren't many libraries with oTel instrumentation built in and monkey patching allows anyone to write instrumentation for a framework. It seems like this patching of require is something that, at least in the medium term, is going to be needed if this project wants to see wider adoption. My assumption is there's a need for a similar monkey patching of modules loaded via the native import statment in Node.js, and that the loader mechanism is the what the Node.js core team has proposed as a replacement for directly patching require.

It's fair to say that it's ultimately up to Node's core team to provide a replacement for this lost functionality -- but it's equally fair to say that Node's core can't do this alone. They need both feedback and direct help from APM/Observability vendors. It seems like OpenTelemetry/CNCF are ideally situated to be a conduit for this.

This is just me thinking out loud but I wonder if the next best steps would be for someone on the OpenTelemetry side of the fence to develop an ESM loader prototype that could be used in place of require-in-the-middle? With a prototype written we'd have a piece of code to bring to Node's core team that would highlight the issues with the current loader implementation, and provide focus for the future work on ESM loaders.

Also -- speaking of future work on ESM loaders, based on these comments it sounds like Node.js core contributor @GeoffreyBooth is up for reviving a working group and mentoring folks who might be interested in working on these features and bringing some of the debates @Flarna mentioned to a close. This sounds like a great opportunity for anyone that's wanted to get involved and influence Node.js core development.

astorm avatar Feb 22 '21 21:02 astorm

Think I got pinged cause I'd done some early experimental module loader investigation and had proven out some concepts. No passive voice needed @michaelgoin, that's exactly why I pinged you. :)

Do you think that work is something that New Relic would be willing/able to share? It seems like it might be a good basis for the prototype I mentioned above (if OpenTelemetry decides an ESM loader is needed/appropriate).

astorm avatar Feb 22 '21 21:02 astorm

@astorm A while ago I ran across this which seems to be what you are looking for. But not sure if this fits current loader hooks or is based on an older variant.

And I agree that something like monkey patching will be needed. My main point was that once OTel API is GA we have now something in place which is not experimental and open source to approach libraries. We should at least try to get monitoring support into the libraries instead of blind continue the error-prone monkey patching approach.

Flarna avatar Feb 22 '21 21:02 Flarna

@astorm I don't think I'd have issues getting NR to share but it looks like that article @Flarna shared is pretty similar to what we'd experimented with (leveraging params, etc.). If we do find a need, I can dig back through the old notes and see if there's anything additionally useful to surface up. We also experimented pretty early on so unsure how out of date things are. Or I could likely carve out time to help prototype (and document), as desired.

One thing that was the case back when I first researched, which may or may not still be the case, is CJS modules imported into ESM had full resolve paths so our module/instrumentation name matching was off for auto-instrumentation even with the common loader. We have not gotten to bridging that gap yet and unsure if that will be a problem for this project or not (can't remember how the auto-instrumentation is hooked up anymore). But that may be a small stop-gap-ish thing that may want to get solved regardless, if an issue.

On a further note, I'm confident NR would be willing to help instrument willing frameworks out there to help with Open Telemetry's success.

michaelgoin avatar Feb 22 '21 22:02 michaelgoin

Might be relevant: I was using a downstream library built with opentelemetry and it requires excessive configuration to make rollup bundler happy with opentelemetry. But I was still stuck because I was using vite, which is built on top of rollup and it conflicts with the vanilla rollup config. Then I cracked open the Azure sdk and realized it has written a money patch fix for opentelemetry's export issue.

So I wonder if opentelemetry can do something at the upstream so more bundlers can directly consume it. I also wonder if the folks from Azure SDK would be able to provide some help as they have already spent so much effort hacking around it. Here is their PR

chuanqisun avatar Apr 03 '21 11:04 chuanqisun

https://github.com/DataDog/dd-trace-js/pull/1582/files is the equivalent change that was done to dd-trace-js.

I was bitten by this breaking for some (but not all) of my instrumentation when I moved to ESM, because I imported knex (which is cjs), which in turn loads postgres, and that instrumentation kept working. But the code loaded directly from my ESM main module (express, http) didn't get automatically patched because the hook never was called.

Discussion here: https://cloud-native.slack.com/archives/C01NL1GRPQR/p1632260506086900

lizthegrey avatar Sep 21 '21 22:09 lizthegrey

Seems like the changes are relatively simple to make

They create a small shim that returs iitm only on supported node versions like this:

'use strict'

const semver = require('semver')
const logger = require('./log')

if (semver.satisfies(process.versions.node, '^12.20.0 || >=14.13.1')) {
  module.exports = require('import-in-the-middle')
} else {
  logger.warn('ESM is not fully supported by this version of Node.js, ' +
    'so dd-trace will not intercept ESM loading.')
  module.exports = () => {}
}

Then when they call hook they also call esmHook.

    hook(instrumentedModules, this._hookModule.bind(this))
    esmHook(instrumentedModules, this._hookModule.bind(this))

Simple in theory, though I'm not sure if this would have other impact we're not aware of at the moment. Particularly concerning is that the documentation for the hooks API specifically states it should not be used https://nodejs.org/api/esm.html#esm_hooks

dyladan avatar Sep 22 '21 21:09 dyladan

"experimental" heh

so in order to enable include-in-the-middle, it requires running Node with -loader, so users will need to specify that flag when starting to opt in, otherwise I believe it'll no-op to register an override with include-in-the-middle given it won't be called at all.

but if we don't implement this, it becomes an area of missing features/lack of parity I fear :)

lizthegrey avatar Sep 22 '21 21:09 lizthegrey

Supporting ESM is going to be a moving target for a year or so. I recommend to follow up on this and inform the loaders team of any blockers: https://github.com/nodejs/loaders.

(The API for loaders is going to change soon)

mcollina avatar Sep 22 '21 21:09 mcollina

If we implement this it can be done in the instrumentation package where we already wrap ritm exactly for this type of scenario. Implementation in one place would make it available to all instrumentations.

It seems we have at least a few options:

  1. update our hook method so all instrumentations automatically get esm support
  2. expose a second hook method with experimental esm support that instrumentations can opt in to.
  3. update our hook method to support esm only when experimental esm support is enabled somehow such as a config or env variable.
  4. others?

Supporting ESM is going to be a moving target for a year or so. I recommend to follow up on this and inform the loaders team of any blockers: nodejs/loaders.

That's roughly what @Flarna said earlier. There are still some quite severe restrictions on esm loaders. In a quick scan of the documentation I can't see if the restrictions @Flarna mentioned like only allowing a single loader have been lifted.

(The API for loaders is going to change soon)

When the API changes I assume IITM will be broken. The question becomes, in what way is it broken? If it simply fails to load ESM modules then it's not that big of a risk. If it causes a process crash, that is a huge risk

edit: it looks like chaining is "in progress" according to the loaders team https://github.com/nodejs/loaders/blob/main/doc/design.md#chaining-resolve-hooks

dyladan avatar Sep 22 '21 21:09 dyladan

My understanding is that diagnostic_channel would be the right way to approach this but then you still have the chicken-egg problem as the package needs to instrument

weyert avatar Sep 22 '21 21:09 weyert

My understanding is that diagnostic_channel would be the right way to approach this but then you still have the chicken-egg problem as the package needs to instrument

I have recently added this to both fastify and undici.

mcollina avatar Sep 23 '21 09:09 mcollina

My understanding is that diagnostic_channel would be the right way to approach this but then you still have the chicken-egg problem as the package needs to instrument

Don't think so, diagnostic_channel is useful for some instrumentations (ones that only create span like for databases) but as soon as we need to propagate context (across process boundary) its not sufficient.

vmarchaud avatar Sep 23 '21 12:09 vmarchaud

this is not strictly true. In Fastify we only publish when we create a new server - mainly for folks to instrument in their own way.

mcollina avatar Sep 23 '21 16:09 mcollina

I am wondering what's easier to convince package maintainers to instrument with diagnostic_channel or @opentelemetry/api. Personally, would prefer the latter :D

weyert avatar Sep 23 '21 18:09 weyert

There's no real reason a package maintainer couldn't support both

dyladan avatar Sep 23 '21 18:09 dyladan

Thanks all for reviving this old issue, the original poster (me) appreciates it. It seems like there's a great opportunity here for collaboration between APM vendor engineers (via OpenTelemetry) and the node core engineers who've noticed that APM vendors aren't always as involved as they'd like :)

@mcollina it sounds like @dyladan's main concern is about the stability of the current loader hooks implementation and what happens when it's next revved.

When the API changes I assume IITM will be broken. The question becomes, in what way is it broken? If it simply fails to load ESM modules then it's not that big of a risk. If it causes a process crash, that is a huge risk

@mcollina as the (possibly reluctant) thread participant with node core project ties, would the node core team be able to suggest something that would alleviate this concern? Basically "how can we use the current loader(s) API (either through import-in-the-middle or some other way) and be sure that an upgraded node wouldn't crash the application. I don't think the experimental label is a problem (makes long eye contact with the experimental async_hooks module that most APM vendors including oTel rely on), but being able to know loader hooks has the sort of API stability that institutional APM customers are looking for is important. @mcollina if you're not the person who can help with this could you suggest someone who is?

@bengl hello! Looping you in to this discussion re: import-in-the-middle and loader hooks. If you have the time/attention (no worries if not), do you have any thoughts on the above, and what sort of contract you're providing with import-in-the-middle? Put another way, will import-in-the-middle blow up when the loader hooks API is next revved, or have you taken steps to make sure it won't?

astorm avatar Sep 24 '21 01:09 astorm

Regarding loaders: There is a complete rework ongoing and it will take a while. First step already landed on master and I assume it will break import-in-the-middle. There is a dedicated loaders team in node core working on this (see https://github.com/nodejs/loaders/). I would not expect that loaders get out of experimental that soon.

Please note also that these loaders is a pure Node.js thing and not applicable/present in e.g. browsers.

Regarding diagnostics channel vs OTel: Diagnostics channel is by far more generic as it allows more consumers and it doesn't specify what consumers should do with the data. OTel on the other hand has a well specified export chain. Note also the Diagnostics Chanel is still marked as experimental and only available in recent nodejs versions (14.17.0 and 16) - it's not available on e.g. browsers.

Flarna avatar Sep 24 '21 08:09 Flarna

@dyladan

When the API changes I assume IITM will be broken. The question becomes, in what way is it broken?

Modules will fail to load and apps will likely crash. This will be fixed very soon. See below.

@astorm

Put another way, will import-in-the-middle blow up when the loader hooks API is next revved, or have you taken steps to make sure it won't?

The change currently on Node.js master replaces a few hooks with a single one. We'll make that change (very soon) in import-in-the-middle supporting both the old hooks and the new one. Our plan is to remain up-to-date with whatever's happening in Node.js core, supporting as far back as we can. We'll take any new changes into account, including upcoming loader composability. If needed, we'll switch based on Node.js version, but for now I think we'll fine just exporting all the old and new hooks.

If folks want to help out in future-proofing IITM, PRs are most certainly welcome. 😄

@weyert

I am wondering what's easier to convince package maintainers to instrument with diagnostic_channel or @opentelemetry/api. Personally, would prefer the latter :D

I would say the former. There are already examples of it in fastify and undici, and hopefully many more in the future. Like others have said, this enables OT and APM vendors, and other kinds of observability tools too, eg. security. The good news for opentelemetry is that it should be straightforward to use library-provided diagnostics_channel events where available without monkey-patching and heavy-handed tools like RITM/IITM.

We're in the early stages of using diagnostics_channel to underly all of our instrumentation in dd-trace-js, hoping to push more of the pattern to the libraries we're instrumenting wherever/whenever possible.

bengl avatar Sep 24 '21 16:09 bengl

Thanks all for reviving this old issue, the original poster (me) appreciates it. It seems like there's a great opportunity here for collaboration between APM vendor engineers (via OpenTelemetry) and the node core engineers who've noticed that APM vendors aren't always as involved as they'd like :)

It would be amazing to have more APM vendors involved in implementing and maintaining Node.js instrumentation.

@mcollina it sounds like @dyladan's main concern is about the stability of the current loader hooks implementation and what happens when it's next revved.

Unfortunately it's going to be a moving target and there is nothing we can do about it. Part of the instability is also due to certain corner-cases that needs to clarified/specified by TC39.

Given Node.js release cadence, changes will not be weekly but rather monthly/quarterly. I recommend somebody from OpenTelemetry to join the loaders team.

@mcollina as the (possibly reluctant) thread participant with node core project ties, would the node core team be able to suggest something that would alleviate this concern?

I think the best approach would be for a company to allocate somebody's time to keep everything up to date and even do part of the work.

Basically "how can we use the current loader(s) API (either through import-in-the-middle or some other way) and be sure that an upgraded node wouldn't crash the application. I don't think the experimental label is a problem (makes long eye contact with the experimental async_hooks module that most APM vendors including oTel rely on), but being able to know loader hooks has the sort of API stability that institutional APM customers are looking for is important. @mcollina if you're not the person who can help with this could you suggest someone who is?

I'm not the most informed person on the Loaders work as it is a spec and detail heavy project with lot of complexity (I'm more of a distributed system engineer).

Anyhow, I would expect Loaders go into a place similar to async_hooks, experimental but mostly stable and reliable in LTS versions.

mcollina avatar Sep 25 '21 07:09 mcollina

@dyladan it sounds like @bengl, the maintainer of import-in-the-middle, is committed to being on top of stopping and fixing crashers if/when the loaders API is revved. This seems (from my naive point of view) enough to make import-in-the-middle no more risky than the other third party dependencies used by the open telemetry project. Is this enough for you to be comfortable using it? If not what would you need to hear in order to be comfortable moving forward with import-in-the-middle?

astorm avatar Sep 25 '21 15:09 astorm

I'm not the most informed person on the Loaders work as it is a spec and detail heavy project with lot of complexity (I'm more of a distributed system engineer).

Ah, I see, my apologies @mcollina for pulling you into an area you don't work in.

astorm avatar Sep 25 '21 15:09 astorm

@mhdawson @GeoffreyBooth :wave: pulling you in because you seem (per: https://github.com/nodejs/loaders/issues/34) to be the Node.js loaders team/group. If that's not the case A. My apologies and B. if you know better people for these sorts of questions please let us know.

Can either of you recommend a safe way to adopt the current loaders APIs? We're basically looking for a way that we can adopt them now, but not have a process crash when they're next revved?

astorm avatar Sep 25 '21 15:09 astorm

@astorm I am not currently working on the loaders api it self but i am familar with the NodeJS API Release cycles and processes. Experimental has nothing to do with crashing it is a message and flag that indicates that the API should exist while we do not fully know how it will look like.

So Expermientel Means in general that you can adopt it save but you need to commit to the fact that you need to do additional changes fast without prev deprecation or notification.

As long as you track the loader api page with every nodejs version release and as long as you only want to maintain and use it in the latest nodejs version and the current nodejs version. It will feel like a Stable Api with the only fact that it changes MAYBE more fequently.

If you plan to support it from now till forever and want to keep it backward compatible to the current nodejs version of today 14.17.6 you will need to maintain your own compatibility matrix and functions to keep 14.17.6 supported when nodejs 16 is current.

frank-dspeed avatar Sep 26 '21 04:09 frank-dspeed