apm-agent-nodejs icon indicating copy to clipboard operation
apm-agent-nodejs copied to clipboard

investigate: loader hooks for ESM Modules (was: import-in-the-middle for ESM Module Projects)

Open astorm opened this issue 4 years ago • 4 comments

Is your feature request related to a problem? Please describe.

New versions of Node.js have native support for the import statement. The import statements imports a module. The import statement bypasses our require-in-the-middle hook and will not have instrumentation applied.

Describe the solution you'd like Import in the middle is a new module that uses Node's loader hooks for intercepting modules that use import statements. We should investigate how much of a drop in replacement this is for require in the middle, and whether this can provide some level of ESM module support.

In addition to the basic question "can we make it work", a few other things to consider

You cannot add new exports to a module. You can only modify existing ones.

Does this impact us? Do we add additional properties to a module in any of our instrumentation?

While bindings to module exports end up being "re-bound" when modified in a hook, dynamically imported modules cannot be altered after they're loaded.

What do we tell users/customers if there's two modules attempting to wrap the same module?

astorm avatar Sep 22 '21 15:09 astorm

Steps to Complete

  • [x] 1. Develop/repurpose a "moden" test application that uses ESM modules to create a small web service that includes additional spans created via one of our DB instrumentations (mysql, elasticsearch, etc)

  • [x] 2. Implement loading of our instrumentation modules via the import-in-the-middle NPM module -- will be considered "done" when we can load the agent into the test application above via the --require option and have transactions and spans for the web requests and db calls

  • [x] 3. Identify areas where import in the middle might fail (i.e. it can only replace methods, it doesn't allow new properties to be added) and triage deal breakers

  • [ ] 4. Stretch Goal: what steps are needed to use the agent as an ESM module. The ability to say import * from elastic-apm-node

astorm avatar Jun 21 '22 22:06 astorm

Putting this one down for a bit -- after a bit of research and some experimenting we've determined that the approach import-in-the-middle takes isn't compatible with our current instrumentation modules. Specifically -- require-in-the-middle allows us to hook, at runtime, the actual object returned by a module. The import-in-the-middle module does not provide the actual object returned by a module -- instead it offers a proxy object that allows you to set new values on the exported modules.

While useful, this isn't compatible with our current instrumentation approach, which presumes it's receiving the actual module object and often includes doing more than just setting some values. (ex. calling an exported function/method preemptively: https://github.com/elastic/apm-agent-nodejs/blob/f5307cd5ba28485f135b20d16ef7202081b52ecc/lib/instrumentation/modules/fastify.js#L32)

This research did provide some possible next steps to explore for ESM support

  1. Using import-in-the-middle as a guide write our own loader hook that will (for modules we instrument) generate a module that will load the original modules, pass it to our instrumentation instrumentation/module/[name], and then export each individual property of that module back out

  2. Re-attempt the fastify+elasticeach application with this new scheme

  3. Plan testing and any necessary changes to other our other modules (dependent on how 2 goes)

  4. Stretch Goal: what steps are needed to use the agent as an ESM module. The ability to say import * from elastic-apm-node

astorm avatar Jul 11 '22 22:07 astorm

Putting this one down for this release cycle. Larger write-up is incoming, but the big takeaways from this round of research

  1. Import in the middle is not a drop in replacement for require in the middle. I've opened an issue with the import in the middle maintainers to test the temperature of whether it couple be

  2. It's not possible to use loader hooks without all the proxy object gymnastics. Per the ECMA262 spec ESM modules cannot be changed outside of their definition file. In spec talk, a module namespace exotic object is not extensible. Import in the middle does their whole "generate a new version of each export and wrap in a Proxy object with magic setter" thing to work around this.

  3. Re: the above -- the "modules cannot be changed outside their source" code restrictions does not appear to apply to CommonJS modules loaded via import when a project's in type=module mode.

All of this points to us not being able to just reuse our existing require-in-the-middle based instrumentation modules with import-in-the-middle. Either

  1. Each instrumentation module will require some subtle (or not so subtle) modifications to work with a the proxying solution that import-in-the-middle provides while maintaining require-in-the-middle comparability

  2. We'll need to come up with some sort of code generation solution that can create a copy of the instrumentation code inside the generated module code so exported objects can be modified before they're exported, similar to how import-in-the-middle generates a new module to wrap with a proxy object.

  3. We'll need to accept that require based code and import based code require different approaches to instrumentation and maintain two sets of each instrumentation (a "not great" solution -- mainly mentioning it so it doesn't linger in anyone's mind)

astorm avatar Aug 15 '22 20:08 astorm

Thanks.

  1. Each instrumentation module will require some subtle (or not so subtle) modifications to work with a the proxying solution that import-in-the-middle provides while maintaining require-in-the-middle comparability

This strikes me as the most feasible. Perhaps next steps could be: (a) modify one of our instrumentation modules to work with ESM to get a concrete example of the kind of rewrite we are looking at; and (b) enumerate all of our instrumentations and identify which need work and whether (to a first approximation) they'll need significant re-writes.

For (a), is there an example of a module we currently instrument that is either ESM-only in a newer version or distributes separate CJS and ESM modules? got is an example of the former. AWS SDK v3 is an example of the latter. However, we don't currently instrument either of the two.

2. We'll need to come up with some sort of code generation solution that can...

If I understand the idea correctly, this seems fraught. We'd be maintaining something close to patches to source code, which I think would be way harder than the current monkeypatching of module APIs. For example if the 'redis' module did an internal implementation-detail-only restructuring of source files to move some function we patch to a different source file, then our source-level patching would fail.

Interesting idea, however. If the Proxy-based IITM solution has significant performance impact (I have no feel for it), then source-level patching might provide a faster answer. If we got that far, however, we'd perhaps be better off implementing diagnostic_channel patches for the particular module. :)

trentm avatar Aug 15 '22 23:08 trentm

Continuing this in https://github.com/elastic/apm-agent-nodejs/pull/3381

trentm avatar May 25 '23 22:05 trentm