node-custom-lambda icon indicating copy to clipboard operation
node-custom-lambda copied to clipboard

Support for Node v14.x

Open sodiray opened this issue 5 years ago • 44 comments

The changes from v12.x to v14.x are very small. All I really did was

  • copy paste v12 dir
  • update requires to imports
  • update dynamic handler import in boostrap.js
  • update test package.json to use module type
  • add package.json in root of v12.x with module type

On my machine: ✅ build.sh completes and produces expected layer.zip ✅ test.sh completes and outputs expected log lines 🟨 publish.sh -- obviously I can't do this 🟨 check.sh -- obviously depends on ^^^

sodiray avatar Dec 12 '20 01:12 sodiray

Awesome – thanks for giving this a shot – I was just gonna wait until official support TBH 😆

Might just be worth adding a top-level await statement to test/index.js to make sure that works – would be a nice-to-have

mhart avatar Dec 12 '20 02:12 mhart

Hey @mhart ! Loving the quick response! I would totally wait if I could but its one of those situations 😞 it was this or transpiling 🤮 (IMHO)

I added a top-level await ✅ works like a charm 🍀

sodiray avatar Dec 12 '20 02:12 sodiray

Ok, so I've had a play – the main problem is this doesn't work out of the box with CommonJS modules – so it's not straightforward for people to just upgrade their existing Lambda functions.

I have no idea if something like this would work or not – I haven't played around enough with mixing ESM and CommonJS:

import { createRequire } from 'module'
global.require = createRequire(import.meta.url)

The other option would be to have an env var that toggles this behavior

mhart avatar Dec 12 '20 02:12 mhart

Based on my experience it will have to be one way or another during the build because of the type: "module" flag in the package.json. I'll do some r&d and see if there is a way to get node to handle it dynamically at runtime. You can probably guess I'm partial to ES6 modules as thats what I setup and of course use.

sodiray avatar Dec 12 '20 02:12 sodiray

Basically, if we can get it to work with test/index.js from v12.x as well – that would be ideal.

I'm just not sure whether we should try for both out-of-the-box, or just ESM out-of-the-box with CommonJS support behind an env var, or CommonJS out-of-the-box and ESM behind an env var.

mhart avatar Dec 12 '20 02:12 mhart

based on Node docs

Node.js treats JavaScript code as CommonJS modules by default. Authors can tell Node.js to treat JavaScript code as ECMAScript modules via the .mjs file extension, the package.json "type" field, or the --input-type flag.

I'd say I set this up backwards. By default we should support CommonJS modules and allow for a flag to opt into ES6 modules. We might be able to handle an env var in the bootstrap and use the --input-type flag.

sodiray avatar Dec 12 '20 02:12 sodiray

Yeah, I think that sounds right – I think it's better by default to assume traditional CommonJS modules

mhart avatar Dec 12 '20 02:12 mhart

I haven't dived in yet, but for some reason this is noticeably slower to init than v12.x – which I suspect would also reflect in Lambda itself.

Any idea why that might be? Just the overhead of importing instead?

mhart avatar Dec 12 '20 02:12 mhart

(that's why you see all the pings btw – because it takes so long to init – they ping every 100ms so usually you don't see them)

mhart avatar Dec 12 '20 02:12 mhart

I did add that 1 second sleep to the test file as the top-level await test. Not sure if thats where your noticing the performance?

sodiray avatar Dec 12 '20 03:12 sodiray

Haha, lol – yeah, that'd be it – didn't even think of it

mhart avatar Dec 12 '20 03:12 mhart

Ok, r&d complete. Dynamic es6 modules | commonjs modules switching is not available in any workable way for our use case. I can think of some hacks but they involve eval or adding more scripts -- all bad things for performance and security. You may want to dig in a little yourself to validate this but my suggestion is we break the v14.x dir into two subdirectories:

-- v14.x
    |
    | -- modules
    |     | -- bootstrap.c
    |     | -- bootstrap.js
    |     | -- Dockerfile
    |     | -- package.json
    |     \ ...
    |
    \ -- commonjs
          | -- bootstrap.c
          | -- bootstrap.js
          | -- Dockerfile
          \ -- ...

then provide commonjs|modules specific layers.

sodiray avatar Dec 12 '20 03:12 sodiray

Oh, we can't just do it behind a flag? Like, either import() or require() depending on an env var?

mhart avatar Dec 12 '20 03:12 mhart

After some more work I am happy to eat my words and say.... it works 🥳 A few updates in the last commit

  • made two test directories
    • module
    • commonjs
  • updated test.sh script to work with them as
    • ./test.sh module
    • ./test.sh commonjs
    • ./test.sh --> defaults to commonjs
  • added a handler7 to the both test cases that explores some module|commonjs features just to be sure ✅ our implementation fully supports the differences -- not just the loader (require|import). See docs.
  • updated bootstrap.js to handle NODE_MODULE_LOADER_TYPE environment variable
    • defaults to commonjs
    • validated to only allow 'commonjs' or 'module' to match node's defined usage
    • just as you said @mhart use either import() or require() based on the env var 🎉

sodiray avatar Dec 12 '20 04:12 sodiray

I published this in my aws account, based a lambda on it, and did a little testing ✅ NODE_MODULE_LOADER_TYPE env var is being handled well and importing/running code for both module types properly 👍

sodiray avatar Dec 12 '20 21:12 sodiray

I know this issue is a bit outside the scope here, but I ran into it using your image and wanted to get another opinion. I am using a lambda layer for all of my node_modules, because I have a lot of lambda functions that share a set of code.

Commonjs relies on the NODE_PATH env variable to add /opt/nodejs/node_modules to the require lookup. (relevant docs) With es modules, that has been disabled in node. The suggested workaround from the docs is to use a symlink, but im not really sure how you would go about that in the lambda-runtime layer.

Thanks for the hard work putting together this runtime.

brandonryan avatar Dec 13 '20 02:12 brandonryan

Hey @brandonryan 👋 thanks for the appreciations! Question, did you build and publish the v14.x layer from this pull request? i.e. are you running the latest work here? I ask mainly for clarity sake, I checked out the docs and I see what you're saying and ya there isn't anything in this PR to handle that.

If we were to support this I imagine it being being behind a flag (some ADD_NODE_SYMLINK env var). If it were true we would add the link before importing the client module. It strikes me though that if we did do this, and it did work, it might make more sense to move that responsibility down to the client 🤔 so if you happen to be in this spot, first thing in your handler you might add the link your self.

It also strikes me that this could possibly be something that should alway be patched. I don't see any harm in making the symlink (aside from a few more ms) so image users don't have to consider this -- it will just work.

It also also strikes me that I just showed up Friday and started committing code and now I'm pontificating like I own the repo, asking more questions than answers 😆 🤷 I'm sure Michael will have some thoughts.

sodiray avatar Dec 13 '20 07:12 sodiray

I was using the latest version of your fork on the node14 branch, yes.

I think the flag idea is ok. I don't think it's reasonable to push the logic down to the client though, because every single handler would have to contain that code. I prefer the symlink always being patched in. If we are worried about broken symlinks then we can add the /opt/nodejs/node_modules dir as well. I might be missing some issue with this method but I (think) it would be fine? I'll give it a try locally and get back to you.

brandonryan avatar Dec 13 '20 17:12 brandonryan

Ok i'm stumped. If you try to create a symlink you get the following error:

{
  "errorType": "Error",
  "errorMessage": "EROFS: read-only file system, symlink '/opt/nodejs/node_modules' -> '/var/node_modules'",
  "stackTrace": []
}

Which makes sense.. I guess aws doesnt want you modifying anything on the filesystem. I also tried /node_modules and /var/task/node_modules with no luck. Here is the code change if you want to try:

import { symlink } from 'fs/promises'
//...
if(NODE_MODULE_LOADER_TYPE === 'module') {
      await symlink('/opt/nodejs/node_modules', '/var/node_modules')
}

edit: trying to modify the dockerfile now.

brandonryan avatar Dec 13 '20 18:12 brandonryan

Getting closer.

I modified the dockerfile to include RUN ln -s /opt/nodejs/node_modules /var/node_modules at the end and inspected the docker image to verify it was there. However, when i try to import something in my handler, it doesnt work.

Figured out this is because /var gets wiped sometime after the image is built. If i do a fs.opendir('/var') the contents are not the same, and my symlink is gone. (which makes sense because /var is used for transient data)

I tried putting a node_module link at / instead, but in the node esm resolution algorithm, it states that it stops looking when it gets to root (step 10)

My last ditch attempt is to use a experimental loader hook which is an unstable api unfortunately. Definitely going to want to put an env flag guarding against this one. Currently working on that, will post follow up.

(this is starting to get really out of scope of this pull request but i dont really know where else to put this information)

brandonryan avatar Dec 13 '20 20:12 brandonryan

Woo! 🎉 Got it working with the module loader! I'll create a pull request on your branch. Experimental loaders are only used for esm so no worries with commonjs compatibility.

brandonryan avatar Dec 13 '20 21:12 brandonryan

Awesome work and updates! Makes sense aws won't let you add the symlink on the fly, I forgot they only allow you file access to /tmp. A solution might be to modify your layer that has all the packages to install them into the /tmp dir (if that's possible). Then in handlers add the symlink. I agree now pushing onto the client isn't a good solution but if something like that works.

Heres an idea, make your own require func that wraps the builtin require. You could do some logic to pull certain packages from /opt/nodejs/node_modules. Maybe have a package whitelist or a prefix.

const requirex = (require) => (path) => {
    if path starts with '~layer/':
        the real path = path - '~layer/'
        return require('/opt/nodejs/node_modules/' + the real path)
    return require(path)
}

global.require = requirex(require)

obviously, pseudo code and unverified

sodiray avatar Dec 13 '20 23:12 sodiray

Oh my page was open and didn't see you got it working! Yay!

sodiray avatar Dec 13 '20 23:12 sodiray

I don't love it... I don't like it... haha the warning on that --experimental-loader flag makes me so nervous. There is some sense of security against them changing the loader api because this layer is immutable in a sense -- once the layer.zip is created the encompassed node version can't change. Publishing layer for new 14.x versions would need to be done with caution because even a minor/patch update to node could cause a breaking change the loader api were using.

All that is probably already understood by everyone. I'm on the fence.

Another concern is performance. If I understand the way its written, it will always try to pull every required import from /opt/nodejs first and if that fails it will then try the normal import. I imagine this will have a noticeable impact on performance.

https://github.com/lambci/node-custom-lambda/pull/25/files#diff-fdb3033acc0a5a9767245a57a9cb21b76d794a6df53c9de560c0d4c6c51644caR1-R10

sodiray avatar Dec 13 '20 23:12 sodiray

I agree with the node on performance. I think I did the priority backwards and it should try to load from the current path first, then /opt. That should mitigate any performance hit, but import errors might look a little weird. (unless i were to save the original error and throw it if any others aren't found)

I completely understand the hesitancy to use the --experimental-loader flag. It was just the only solution that i could find without some custom js require. My goal was to support es modules in the same way that commonjs modules were supported without any special code. This seems like all the more reason to add an ENV flag to turn that option on.

Will update with pull request in a bit.

brandonryan avatar Dec 13 '20 23:12 brandonryan

For sure. I'm actually a big fan of this solution aside from experimental state. If the api were stable I'd be thrilled. A few line custom loader is pretty slick.

I was thinking maybe we put this behind an opt in flag as well but I'm not sure it would really matter (that is If you switch the try order so it tries the default first). If you try to import a module and its not available your lambda is erroring out anyway -- whats another few ms to check /opt/nodejs in that situation?

Someone could also have some try { await import(...) } catch (err) { default case } like logic that would see a performance hit if we don't put the custom resolve behind a flag.

sodiray avatar Dec 14 '20 00:12 sodiray

@rayepps made another pr. Changed the order of precedence and added the missing paths. Any modules resolved with the default resolution should take no performance hit now. I'm happy calling it complete now.

Note: I wanted to try to do some new async resolution but the defaultResolve function is synchronous so i couldn't

brandonryan avatar Dec 14 '20 03:12 brandonryan

@brandonryan is the loader stuff only an issue if ESM is used? If so, we could just leave the experimental stuff up to the NODE_OPTIONS env var if ppl wanted to use it. ie, they can set NODE_OPTIONS env var on their Lambda to something like:

NODE_OPTIONS=--experimental-loader=/opt/esm-loader-hook.mjs

mhart avatar Dec 14 '20 18:12 mhart

I feel like the best way to provide backwards compat is to default everything here to be in CommonJS – ie, bootstrap.js uses requires, etc. Then NODE_PATH will work as normal, as should any older Node.js code.

We can just use import() if the env var is set (would be good to see if there's any prior art for an env var name here? NODE_MODULE_LOADER_TYPE seems quite generic, so might have issues by clashing with something else – NODE_LAMBDA_MODULE_TYPE might be better?)

We could also explore whether it's worth auto-guessing based on file extension .mjs, or if there's a package.json at the top-level with "type": "module". Those methods might be cleaner in the end rather than relying on an env var. It's (usually) pretty easy for devs to just control the entry point of their handler or a package.json file.

mhart avatar Dec 14 '20 18:12 mhart

@mhart the --experimental-loader=/opt/esm-loader-hook.mjs option is only used by node when es modules are being loaded, so no risk of commonjs issues there. Relevant doc

When hooks are used they only apply to ES module loading and not to any CommonJS modules loaded.

Using NODE_OPTIONS for the loader flag would work, but I have two annoyances with it:

  1. It would require the user of the lambda runtime knows where that file is
  2. It would not adhere to the /opt dependency convention as described in the aws docs by default. The user of the runtime has to do an extra step to make work what works in every other runtime by default.

That being said, im not the maintainer of the repo, so use your own discretion.

As a side note, I just now thought it might be a good idea to read NODE_PATH then use the paths from there instead of using a hard coded list in the loader. I can make that change if you want me to.

Currently everything does default to commonjs.
I really like your idea about auto detecting whether the source in task root is mjs or reading the package json. I think thats the way to go.

brandonryan avatar Dec 14 '20 18:12 brandonryan