modules icon indicating copy to clipboard operation
modules copied to clipboard

Patch/Instrument a module

Open Flarna opened this issue 5 years ago • 25 comments

Maybe the information I'm looking for is already somewhere but I'm not able to find it. I'm happy if I get redirected. I'm also quite new in the ECMAModules area so I may simply don't know significant details so please excuse if my question is inaccurate or improper.

There are quite some tools (Transformers, APMs,....) out which monkey patch module._load() and/or module.prototype.require() to hook into loading of CJS module loading to modify their content. For APMs the modification is usually just to wrap a handful of the exported functions but keep the rest as it is.

I tried to find the corresponding solution for ECMAModuls but I failed. I found #98 which also describes this requirement and it links to https://github.com/bmeck/node-apm-loader-example but this sample seems to no longer work (useing 12.4.0). I get following error (after fixing a few nits):

const exports = Object.create(fs, {
                              ^

ReferenceError: Cannot access 'fs' before initialization
    at file:///C:/workspaces/GitHubForks/node-apm-loader-example/overloads/fs.mjs:4:31
    at ModuleJob.run (internal/modules/esm/module_job.js:111:37)
    at async Loader.import (internal/modules/esm/loader.js:128:24)

Even if it would work I have the impression that this solution actually changes the URL in module map for a patched module, e.g. fs would be then URL to ./fs.mjs. For an user importing fs this is don't care but if there are more hooks listening on fs only the first one would be applied. At least in CJS I have seen several times that more then one APM/Transformer is in use within one application.

Besides that this sample is in my opinion unneeded complicated for the common usecase to just wrap a few exported functions but keep the remaining stuff as it is. Would it be possible to create a simpler hook for this?

Flarna avatar Jun 17 '19 08:06 Flarna

First of all - thanks for reaching out! One quick pitch before I get into this: we're currently missing APM representation in the modules working group (afaik). So, if you can join yourself or know somebody who could, that could help us make sure our solutions keep that perspective in mind. We are aware of this use case but there's always a lot of gray space between "can be done" and "is nice to work with".

The important piece is "loaders aren't done". There is a loader implementation, there are examples of using it, but there's still varying opinions on how close it is to a final loader hooks API.

The way multiple loaders would (may?) work is the following, assuming loaders A, B were registered in this order:

  1. B resolves fs, yields to previous (A) for originalURL. Then it returns file:///path/to/apm/fs.mjs?original=${originalURL}.
  2. A resolves fs, yields to previous (<native>) for originalURL. Then it returns file:///path/to/dev-tracer/fs.mjs?original=${originalURL}.
  3. The <native > loader returns nodejs:fs (placeholder).

The important part is that A would need a way to prevent its own eventual import of the original nodejs:fs. It's easy to prevent that inside of A but a lot harder from inside of B (since it doesn't know that A's code is privileged and should be granted access to the original fs module).

Another issue for random 3rd party modules is also the "intercept and re-exports is hard" problem. export * from 'original' works but won't include the default export. And it's impossible to mutate the exports directly from the outside without wrapping.

The short version: This hasn't actually been figured out yet and that's true for loader hooks in general afaik.

jkrems avatar Jun 17 '19 14:06 jkrems

This is my primary concern as well. I work in the APM area and have been trying to keep an eye on loaders but I haven't seen much activity.

Can you point me at any discussions other than #98 above?

The way our patching works is to replace the require function with our own. Each module that is patched effectively replaces the original module. The way you describe looks like we will need to manifest each patched module with a file that somehow requests/requires the original/native module and implements our patching. Am I understanding it correctly?

Let me know if/how I can help here. I cannot devote full time to this but will do what I can.

bmacnaughton avatar Jun 17 '19 17:06 bmacnaughton

Thanks for the info. I will take a closer look into the current codebase to get deeper into this. Would be nice to get hints if there are any docs/minutes/... somewhere describing the proposed solutions and ideas tried till now.

I'm fine with being include in this group. Can't promise any specific number of hours/days/... to work on this but for sure I can help with usecases we have and what we have seen in the wild in this area.

Usually the tools hooking into the loaders are not aware of each other. A typical example is that some developer uses a transpiler like babel and the operations team in his company decides to add an APM tool in production. The developer is maybe not even aware of this as it happens via some command line option (e.g preloading via -r,...) during deployment. The same app may have several additionally indirect dependencies to CLS like modules which often works also via monkey patching in a loader hook.

So in short it's usually not possible to assume that one loader hook knows about the other and new hooks may jump in/out just because of some change in an indirect dependency. Even now it happens that one breaks the other (see https://www.npmjs.com/package/pirates for a solution for one of these conflicts). Most of the time it works fine as all of them just wrap APIs without modifying any directly visible characteristic like name/caching/...

If ECMAModule hooks require a dependency between each other it should be reflected in the hook API to ensure that the complete info arrives at all loaders in the same way and sequence is don't care (as long as the hooks just wrap APIs and don't patch away exports).

Flarna avatar Jun 17 '19 19:06 Flarna

@Flarna - pirates was very different than I was thinking - we patch the compiled module, not the source code itself. In doing so we always keep the API unmodified so that we don't break the code that uses it. I had never seen patching work at the source level before. Is this how you have patched code?

bmacnaughton avatar Jun 17 '19 19:06 bmacnaughton

We patch/wrap just the exports and share this approach with other APMs and at least CLS like modules.

But I think the loader hooks should be not limited to this use case therefore I referenced also the others. In general both variants tend to keep the structure/API of the exports to don't break anything, modifying code is more intrusive but clearly offers more possibilities.

Not sure if both usecase should be supported by the same loader hook.

Flarna avatar Jun 17 '19 19:06 Flarna

@jkrems I’d like to help bring representation to instrumentation API or hooks. I am an observer but have missed the last few. I’m more than happy to explain my use cases and what I’d like to see from ESM that we didn’t get get with CJS. How can I bring this to the table?

bizob2828 avatar Nov 22 '19 01:11 bizob2828

@bizob2828 and the others on this thread, can you make the next meeting? I’ve added the modules-agenda label so that we discuss loaders at the next meeting (2019-12-04 at noon Pacific). If that doesn’t work for you please create a Doodle for an out-of-band loaders meeting and we can find a time that works for the most interested parties. Loaders is probably our biggest outstanding feature right now, so it’s something we’d like to get some attention and dev effort focused on.

GeoffreyBooth avatar Nov 22 '19 06:11 GeoffreyBooth

@GeoffreyBooth I can. It's very intimidating to talk in these meetings so are there use cases or code samples I can create to seed this discussion? Also would this just be scoped to some instrumentation/hooks of ESM or would it also apply to CJS? I get confused when this is referred as loaders. If you can share any design docs or examples of proposed APIs it would help.

bizob2828 avatar Nov 22 '19 16:11 bizob2828

@GeoffreyBooth I can. It's very intimidating to talk in these meetings so are there use cases or code samples I can create to seed this discussion? Also would this just be scoped to some instrumentation/hooks of ESM or would it also apply to CJS? I get confused when this is referred as loaders. If you can share any design docs or examples of proposed APIs it would help.

@bizob2828 I’m sorry to hear that you feel that the meetings are intimidating. It’s been a concern of mine for awhile that this group can come across as hostile to outsiders, as there are many members of the group with very strong opinions and itchy GitHub trigger fingers. It’s something we need to discuss and improve.

At least for now I think this would be scoped to ESM, as the story for instrumentation of CommonJS is already pretty well worked out; and ESM is a very different case as unlike CommonJS, ESM has defined stages before code evaluation, and the instrumentation would need to go in one of those earlier stages. That’s what the loaders are for, to let users inject custom code during the “pre-evaluation” phases. There is an experimental loader API already, that you can read about at https://nodejs.org/api/esm.html#esm_experimental_loader_hooks, but it’s far from complete and definitely needs much more development. The idea is that we want to expand this API to cover all these use cases like instrumentation, transpilation and so on.

So I guess if you don’t mind reading that, and maybe listing some of your use cases and also whether you think the current API can support them? If you have time to try to write any loaders for the current API to demonstrate whether or not it handles your use case or not, that would be illuminating. And then your thoughts on what the loader API needs to add to satisfy your needs.

GeoffreyBooth avatar Nov 22 '19 18:11 GeoffreyBooth

@GeoffreyBooth the intimidation is mostly on me and my lack of confidence speaking with folks who may know more on the subject. I will read this stuff and provide some use cases. I hope to get to it before next meeting but with work schedules and travel for holiday I'm not sure

bizob2828 avatar Nov 26 '19 16:11 bizob2828

@GeoffreyBooth I took a look at the loader hooks and unless I'm missing something I don't see a way to get access to the compiled code. My use cases may be very similar to APM tools or something like istanbul:

  • keep track of which modules have been imported(by name and url, which I see I can already do with the example in the docs)
  • access to compiled source code to rewrite(both user and core nodejs code)
  • monkey patch a module at import time(both user and core nodejs code)

The way we're currently doing 2nd bullet is just patching _compile like this

module.exports = function hookModCompile() {
  var mod = Object.getPrototypeOf(module);
  mod.__compile = mod._compile;
  mod._compile = function(content, filename) {
    var newContent = rewrite(content, filename);
    var compiled;
    try {
      compiled = this.__compile.apply(this, [newContent, filename]);
    } catch (err) {
      if (err instanceof SyntaxError) {
       console.log(`failed to compile rewritten code ${err.message}`);
       // re-compile original
       compiled = this.__compile.apply(this, [content, filename]);
      } else {
        throw err;
      }
    }

    return compiled;
  };
};

The way we're currently doing the 3nd bullet is just patching require like this

  const origModRequire = Module.prototype.require;
    Module.prototype.require = function(name) {
      let nodule = origModRequire.call(this, name);
      // do some stuff with required module
    }

The reason why I asked if these loader hooks apply to both cjs and mjs is because our current application is a loader. We need to rewrite the main file but if we just did what APM tools did by requiring in the main, it'd be too late. It appears my ask is like #6

bizob2828 avatar Dec 05 '19 21:12 bizob2828

@bizob2828 Yes as far as I can tell there’s no way to transform source code at the moment. I tried to write a CoffeeScript loader as a proof of concept and I couldn’t do it, as the current hooks just don’t provide a way to override the content of the loaded file. That’s high on the to-do list for loaders, I think.

cc @A-lxe @bmeck

GeoffreyBooth avatar Dec 05 '19 22:12 GeoffreyBooth

I have achieved ES module patching/instrumentation. My rudimentary implementation demonstrates the exact functionality OP describes — made possible with the new v13.7.0 loader hook API, which uses the transformSource hook. Other interesting ideas about how to achieve this were also shared in this issue, which do not seem beyond reach with the current APIs available. What other functionality mentioned in this thread should I prioritize for demonstration?

An observation I've made after reading this thread some more — the term "patching" is being used as a synonym for prototype tampering. A bit confusing, but tampering with import() is not a viable path forward since it's technically not even a function. Prototype tampering may need to take place in these exported IIFEs seen in my demonstration.

https://github.com/DerekNonGeneric/loader339

DerekNonGeneric avatar Jan 31 '20 02:01 DerekNonGeneric

I just made a pretty significant update that exposes a module's local cache to its parent module. ~I have not been able to get tap working with Node configured this way. There are some errors being reported from the core modules.~ If someone could contribute a relevant unit test in a .mjs file, that would be amazing! I think this prototype is reaching almost all of the goals it is currently able to. Not all information is exposed to this transform hook, so things like ~module specifiers and~ builtin core module transpilation don't seem possible at this time.

DerekNonGeneric avatar Feb 01 '20 22:02 DerekNonGeneric

@DerekNonGeneric Thanks for the update and progress on this! On behalf of @Flarna I will have a look into this and see how far I can get. I will keep you posted as soon as I have some results.

Cyclonick avatar Feb 03 '20 14:02 Cyclonick

Removing label for now, we can bring it back when there is more to discuss

MylesBorins avatar Mar 11 '20 19:03 MylesBorins

@DerekNonGeneric - the link https://github.com/DerekNonGeneric/loader339 is no longer valid - is your loader still available somewhere?

also, while this seems so straightforward that there must be some obvious problems, is there a reason that a transformExports hook, similar to the transformSource hook, isn't a viable solution for APM monkey-patching and similar needs?

bmacnaughton avatar Jul 07 '20 21:07 bmacnaughton

Yes, I have it as a private repo since no one wants to give feedback on it. I assume everyone who saw it is hard at work making a commercial product out of it. (joke)

DerekNonGeneric avatar Jul 07 '20 21:07 DerekNonGeneric

you ok with providing me access? i'll provide feedback; i'm not very sophisticated in terms of node's broad needs in this area but i do develop an APM solution for work. my main interest is in making that work as seamlessly as possible.

bmacnaughton avatar Jul 07 '20 22:07 bmacnaughton

Okay, invitation sent. Your feedback would be appreciated in the form of an open issue. Also, feel free to add me on Google Hangouts if you don't understand it or just want to chat about it (email is in my bio).

DerekNonGeneric avatar Jul 08 '20 00:07 DerekNonGeneric

@DerekNonGeneric - spent a little time getting bmeck's original module patching code working with the node api changes for the resolve() hook and i've sent a hangouts invite. thanks for the work you've done; i haven't looked at this since last november and a fair amount has happened.

bmacnaughton avatar Jul 08 '20 23:07 bmacnaughton

@DerekNonGeneric - when you accept my hangouts invite i have a few questions i'd like to explore.

bmacnaughton avatar Jul 09 '20 15:07 bmacnaughton

@bmacnaughton, sounds good! For some reason, I did not receive your invitation. Perhaps you can try emailing me with the email address you are using for Google Hangouts and I can try adding you instead? Looking forward to the discussion. :)

DerekNonGeneric avatar Jul 12 '20 12:07 DerekNonGeneric

@MylesBorins - it seems that the exports section will make it impossible for APM to patch live code if it is not listed in the exports section. that is a common need for APM. is this a good place to discuss it or should i open a new issue?

guy bedford informed me that the resolve hook will be called on files that aren't exported so this won't be an issue. you can ignore this. thanks.

bmacnaughton avatar Jul 16 '20 14:07 bmacnaughton

@bmacnaughton I think this is a fine place to comment. I think that APMs should still be able to patch any file via loaders

MylesBorins avatar Jul 16 '20 14:07 MylesBorins