node icon indicating copy to clipboard operation
node copied to clipboard

WIP: use symbol in weakmap to maintain lifetime of modules' host defined options

Open joyeecheung opened this issue 1 year ago • 11 comments

module: use symbol in WeakMap to manage host defined options

Previously when managing the importModuleDynamically callback of vm.compileFunction(), we use an ID number as the host defined option and maintain a per-Environment ID -> CompiledFnEntry map to retain the top-level referrer function returned by vm.compileFunction() in order to pass it back to the callback, but it would leak because with how we used v8::Persistent to maintain this reference, V8 would not be able to understand the cycle and would just think that the CompiledFnEntry was supposed to live forever. We made an attempt to make that reference known to V8 by making the CompiledFnEntry weak and using a private symbol to make CompiledFnEntry strongly references the top-level referrer function in https://github.com/nodejs/node/pull/46785, but that turned out to be unsound, because the there's no guarantee that the top-level function must be alive while import() can still be initiated from that function, since V8 could discard the top-level function and only keep inner functions alive, so relying on the top-level function to keep the CompiledFnEntry alive could result in use-after-free which caused a revert of that fix.

With this patch we use a symbol in the host defined options instead of a number, because with the stage-3 symbol-as-weakmap-keys proposal we could directly use that symbol to keep the referrer alive using a WeakMap. As a bonus this also keeps the other kinds of referrers alive as long as import() can still be initiated from that Script/Module, so this also fixes the long-standing crash caused by vm.Script being GC'ed too early when its importModuleDynamically callback still needs it.

module: fix leak of vm.SyntheticModule

Previously we maintain a strong persistent reference to the ModuleWrap to retrieve the ID-to-ModuleWrap mapping from the HostImportModuleDynamicallyCallback using the number ID stored in the host-defined options. As a result the ModuleWrap would be kept alive until the Environment is shut down, which would be a leak for user code. With the new symbol-based host-defined option we can just get the ModuleWrap from the JS-land WeakMap so there's now no need to maintain this strong reference. This would at least fix the leak for vm.SyntheticModule. vm.SourceTextModule is still leaking due to the strong persistent reference to the v8::Module.

Refs: https://github.com/nodejs/node/issues/44211 Refs: https://github.com/nodejs/node/issues/42080 Refs: https://github.com/nodejs/node/issues/47096 Refs: https://github.com/nodejs/node/issues/43205 Refs: https://github.com/nodejs/node/issues/38695

joyeecheung avatar Jun 21 '23 16:06 joyeecheung

Review requested:

  • [ ] @nodejs/loaders
  • [ ] @nodejs/modules
  • [ ] @nodejs/vm

nodejs-github-bot avatar Jun 21 '23 16:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52320/

nodejs-github-bot avatar Jun 21 '23 16:06 nodejs-github-bot

Still need to do a bit more investigation to figure out it's really safe this time around, so marking it as WIP (I've done some walkthrough of the vm.compileFunction() part, but haven't really looked into the ModuleWrap part). I suppose we probably also want to put it behind flags until symbol-as-weakmap-key gets to stage 4 though. Also need to think of more test cases to exercise the new memory management.

joyeecheung avatar Jun 21 '23 16:06 joyeecheung

Looks like this fixes the segfault in https://github.com/nodejs/node/issues/43205 and https://github.com/nodejs/node/issues/38695 as well

joyeecheung avatar Jun 22 '23 11:06 joyeecheung

amazing!

targos avatar Jun 22 '23 12:06 targos

It looks to me like symbols-as-weakmap-keys is now Stage 4 as of 2023-01-30, but not all of the documents have been updated.

I would guess back-porting this fix to Node 16 or 18 is going to be a challenge though, right?

hildjj avatar Jun 22 '23 13:06 hildjj

It looks to me like symbols-as-weakmap-keys is now Stage 4 as of 2023-01-30, but not all of the documents have been updated.

Nice. Then I think we don't have to flag this for v20. But yeah v18 or v16 didn't implement that proposal (and we probably shouldn't backport it / cannot do a full upgrade because LTS) so it cannot be backported.

I've tweaked the registry object a bit, added tests, and some analysis to the commit messages about why it fixes the leaks/segfaults. Although I think for the vm.Modules this just fixes vm.SyntheticModule, but vm.SourceTextModule would still leak until we find a way to create ModuleWrap -> v8::Module references without using the problematic v8::Persistent approach (maybe using the namespace object to transitively hold a reference to the Module would work, but that feels hacky to me, so I'll leave it for now).

joyeecheung avatar Jun 22 '23 16:06 joyeecheung

CI: https://ci.nodejs.org/job/node-test-pull-request/52355/

nodejs-github-bot avatar Jun 22 '23 16:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52356/

nodejs-github-bot avatar Jun 22 '23 16:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52357/

nodejs-github-bot avatar Jun 22 '23 17:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52367/

nodejs-github-bot avatar Jun 22 '23 21:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52523/

nodejs-github-bot avatar Jun 27 '23 18:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/53910/

nodejs-github-bot avatar Sep 12 '23 11:09 nodejs-github-bot

Finished the round trips of the required V8 patches. CI is green modulo a known flake (https://github.com/nodejs/node/issues/49605) I started a citgm https://ci.nodejs.org/job/citgm-smoker/3233/ which will still be in the queue for hours though...

cc @nodejs/loaders @nodejs/vm PTAL thanks, I am hoping to get this merged before v21.0.0 is cut to see if this fixes the huge amount of long-standing woes referenced in the OP (and looking at other issues referencing them, there are a bunch more lurking in the repo for years) and blocking people from upgrading from v16 which is already EOL. If this fix proves to be correct after it gets released/tested by citgm(?) I'll request an exceptional ABI breakage for v20.x (required by https://github.com/nodejs/node/pull/49419 and https://github.com/nodejs/node/pull/49491) to backport it so that people can at least upgrade to v20...

joyeecheung avatar Sep 12 '23 15:09 joyeecheung

CI: https://ci.nodejs.org/job/node-test-pull-request/53915/

nodejs-github-bot avatar Sep 12 '23 16:09 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/53920/

nodejs-github-bot avatar Sep 12 '23 21:09 nodejs-github-bot

Checked the CITGM run results a bit:

  • bluebird, dicer, body-parser, csv-parser, bcrypt, tape, ava also fail on main https://ci.nodejs.org/job/citgm-smoker/3236
  • browserify, bufferutil, semver, tape, bcrypt, full-icu-test, ws: these pass for me locally with this PR. Also https://github.com/nodejs/citgm/pull/959 was updated a bit during the span I launched the two runs (which was why I had to abort two builds in the middle because that PR had some merge conflict markers in the JSON file for a while) so it could be related to the updates. From a look into the failures, they are either connection issues or timeouts and unlikely to be related to this PR.

So my interpretation is that the CITGM run (with the trimmed lookup file from https://github.com/nodejs/citgm/pull/959 at least) looks clean.

joyeecheung avatar Sep 12 '23 23:09 joyeecheung

Could you attempt to run Jest as well? If you ran the "bankruptcy PR", then Jest is (currently) removed. Jest heavily utilizes vm.Script (including importModuleDynamically) and also supports native ESM via SourceTextModule.

Used compileFunction before, but had to migrate back to Script due to performance issues (I don't know if this PR affects https://github.com/nodejs/node/issues/35375 at all?).

EDIT: If you have a compiled binary for arm64 Macs, I'm happy to test it myself 🙂

SimenB avatar Sep 13 '23 07:09 SimenB

Could you attempt to run Jest as well?

I ran citgm jest locally and the failures I got are identical to the ones I got from the main branch. So I interpret it as no regressions.

I don't know if this PR affects https://github.com/nodejs/node/issues/35375 at all?

I don't think it does, the underlying issues (V8 not handling code caching + host defined options) are still unfixed in the upstream and they are orthogonal to this PR (this does not affect code caching in any way).

If you have a compiled binary for arm64 Macs, I'm happy to test it myself

I uploaded a binary to my fork: https://github.com/joyeecheung/node/releases/tag/fix-compile-memory - I didn't sign this (not sure how to sign it properly for distribution without doing red tapes) but I think you could make it runnable by doing chmod +x path/to/binary, then try to run it for the first time, and when macOS denies it, go to the "Privacy & Security" system setting, there should be an area asking for permission to run this particular binary, and you allow it - the next time you run it there should be an "open" option available and that'll make it work.

joyeecheung avatar Sep 13 '23 13:09 joyeecheung

CI: https://ci.nodejs.org/job/node-test-pull-request/53931/

nodejs-github-bot avatar Sep 13 '23 14:09 nodejs-github-bot

ran make format-cpp to make the GitHub CI happy..

joyeecheung avatar Sep 13 '23 15:09 joyeecheung

The https://github.com/nodejs/node/labels/notable-change label has been added by @joyeecheung.

Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment.

github-actions[bot] avatar Sep 13 '23 15:09 github-actions[bot]

CI: https://ci.nodejs.org/job/node-test-pull-request/53934/

nodejs-github-bot avatar Sep 13 '23 15:09 nodejs-github-bot

cc @jasnell @addaleax @legendecas who originally reviewed https://github.com/nodejs/node/pull/46785 in hope to get some more eyeballs to double check it's correct this time around before I land this...but I'll try to land this within this week if I don't any more reviews, as a backport of this to v20.x might be worth it even if it breaks the ABI (I think we can just increment the ABI version exceptionally at this point, but it'll get complicated when v21.x goes out next month or when v20.x goes LTS, so I do not want to hold it off for too long).

joyeecheung avatar Sep 13 '23 16:09 joyeecheung

it'll get complicated when v21.x goes out next month or when v20.x goes LTS

Is there any precedent for slightly pushing those timelines back to get this fix in? I'd hate to see this miss the cut because of an arbitrary deadline (particularly for a v20 backport).

schmod avatar Sep 13 '23 17:09 schmod

If you have a compiled binary for arm64 Macs, I'm happy to test it myself

I uploaded a binary to my fork: joyeecheung/node@fix-compile-memory (release) - I didn't sign this (not sure how to sign it properly for distribution without doing red tapes) but I think you could make it runnable by doing chmod +x path/to/binary, then try to run it for the first time, and when macOS denies it, go to the "Privacy & Security" system setting, there should be an area asking for permission to run this particular binary, and you allow it - the next time you run it there should be an "open" option available and that'll make it work.

Can confirm the test suite passes (with the expected punycode errors) 👍

SimenB avatar Sep 13 '23 17:09 SimenB

@SimenB did you happen to notice any performance improvements in Jest at all?

brunocabral88 avatar Sep 13 '23 19:09 brunocabral88

Didn't look for it (or notice anything), but from my understanding this PR is about memory leaks (and avoiding segfaults), not performance.

$ hyperfine '~/Downloads/node packages/jest-cli/bin/jest.js packages/' 'node packages/jest/bin/jest.js packages/' --warmup 2 -i
Benchmark 1: ~/Downloads/node packages/jest-cli/bin/jest.js packages/
  Time (mean ± σ):     15.894 s ±  0.068 s    [User: 82.691 s, System: 11.940 s]
  Range (min … max):   15.807 s … 16.003 s    10 runs

  Warning: Ignoring non-zero exit code.

Benchmark 2: node packages/jest/bin/jest.js packages/
  Time (mean ± σ):     15.987 s ±  0.086 s    [User: 84.379 s, System: 11.295 s]
  Range (min … max):   15.863 s … 16.153 s    10 runs

  Warning: Ignoring non-zero exit code.

Summary
  ~/Downloads/node packages/jest-cli/bin/jest.js packages/ ran
    1.01 ± 0.01 times faster than node packages/jest/bin/jest.js packages/

No real difference (and the global node is Node v20)

SimenB avatar Sep 14 '23 06:09 SimenB

@SimenB sorry, I made the wrong question. I was actually interested in knowing if this could eventually help with the issues reported in the Jest's issue #1967 :)

brunocabral88 avatar Sep 14 '23 07:09 brunocabral88

@joyeecheung this PR fixes the segfault from the reproduction in https://github.com/nodejs/node/issues/35889#issuecomment-751125298 (the first one using Jest - I cannot get the plain Node one to segfault)

EDIT: Also the reproduction in the OP, so #35889 can be marked as fixed by this PR 🙂

🎉

Might be more from #37648 that's fixed by this - I just tested the one I linked first.

SimenB avatar Sep 14 '23 07:09 SimenB