node
node copied to clipboard
WIP: use symbol in weakmap to maintain lifetime of modules' host defined options
module: use symbol in WeakMap to manage host defined options
Previously when managing the importModuleDynamically callback of vm.compileFunction(), we use an ID number as the host defined option and maintain a per-Environment ID -> CompiledFnEntry map to retain the top-level referrer function returned by vm.compileFunction() in order to pass it back to the callback, but it would leak because with how we used v8::Persistent to maintain this reference, V8 would not be able to understand the cycle and would just think that the CompiledFnEntry was supposed to live forever. We made an attempt to make that reference known to V8 by making the CompiledFnEntry weak and using a private symbol to make CompiledFnEntry strongly references the top-level referrer function in https://github.com/nodejs/node/pull/46785, but that turned out to be unsound, because the there's no guarantee that the top-level function must be alive while import() can still be initiated from that function, since V8 could discard the top-level function and only keep inner functions alive, so relying on the top-level function to keep the CompiledFnEntry alive could result in use-after-free which caused a revert of that fix.
With this patch we use a symbol in the host defined options instead of a number, because with the stage-3 symbol-as-weakmap-keys proposal we could directly use that symbol to keep the referrer alive using a WeakMap. As a bonus this also keeps the other kinds of referrers alive as long as import() can still be initiated from that Script/Module, so this also fixes the long-standing crash caused by vm.Script being GC'ed too early when its importModuleDynamically callback still needs it.
module: fix leak of vm.SyntheticModule
Previously we maintain a strong persistent reference to the ModuleWrap to retrieve the ID-to-ModuleWrap mapping from the HostImportModuleDynamicallyCallback using the number ID stored in the host-defined options. As a result the ModuleWrap would be kept alive until the Environment is shut down, which would be a leak for user code. With the new symbol-based host-defined option we can just get the ModuleWrap from the JS-land WeakMap so there's now no need to maintain this strong reference. This would at least fix the leak for vm.SyntheticModule. vm.SourceTextModule is still leaking due to the strong persistent reference to the v8::Module.
Refs: https://github.com/nodejs/node/issues/44211 Refs: https://github.com/nodejs/node/issues/42080 Refs: https://github.com/nodejs/node/issues/47096 Refs: https://github.com/nodejs/node/issues/43205 Refs: https://github.com/nodejs/node/issues/38695
Review requested:
- [ ] @nodejs/loaders
- [ ] @nodejs/modules
- [ ] @nodejs/vm
CI: https://ci.nodejs.org/job/node-test-pull-request/52320/
Still need to do a bit more investigation to figure out it's really safe this time around, so marking it as WIP (I've done some walkthrough of the vm.compileFunction()
part, but haven't really looked into the ModuleWrap
part). I suppose we probably also want to put it behind flags until symbol-as-weakmap-key gets to stage 4 though. Also need to think of more test cases to exercise the new memory management.
Looks like this fixes the segfault in https://github.com/nodejs/node/issues/43205 and https://github.com/nodejs/node/issues/38695 as well
amazing!
It looks to me like symbols-as-weakmap-keys is now Stage 4 as of 2023-01-30, but not all of the documents have been updated.
I would guess back-porting this fix to Node 16 or 18 is going to be a challenge though, right?
It looks to me like symbols-as-weakmap-keys is now Stage 4 as of 2023-01-30, but not all of the documents have been updated.
Nice. Then I think we don't have to flag this for v20. But yeah v18 or v16 didn't implement that proposal (and we probably shouldn't backport it / cannot do a full upgrade because LTS) so it cannot be backported.
I've tweaked the registry object a bit, added tests, and some analysis to the commit messages about why it fixes the leaks/segfaults. Although I think for the vm.Module
s this just fixes vm.SyntheticModule
, but vm.SourceTextModule
would still leak until we find a way to create ModuleWrap -> v8::Module references without using the problematic v8::Persistent approach (maybe using the namespace object to transitively hold a reference to the Module would work, but that feels hacky to me, so I'll leave it for now).
CI: https://ci.nodejs.org/job/node-test-pull-request/52355/
CI: https://ci.nodejs.org/job/node-test-pull-request/52356/
CI: https://ci.nodejs.org/job/node-test-pull-request/52357/
CI: https://ci.nodejs.org/job/node-test-pull-request/52367/
CI: https://ci.nodejs.org/job/node-test-pull-request/52523/
CI: https://ci.nodejs.org/job/node-test-pull-request/53910/
Finished the round trips of the required V8 patches. CI is green modulo a known flake (https://github.com/nodejs/node/issues/49605) I started a citgm https://ci.nodejs.org/job/citgm-smoker/3233/ which will still be in the queue for hours though...
cc @nodejs/loaders @nodejs/vm PTAL thanks, I am hoping to get this merged before v21.0.0 is cut to see if this fixes the huge amount of long-standing woes referenced in the OP (and looking at other issues referencing them, there are a bunch more lurking in the repo for years) and blocking people from upgrading from v16 which is already EOL. If this fix proves to be correct after it gets released/tested by citgm(?) I'll request an exceptional ABI breakage for v20.x (required by https://github.com/nodejs/node/pull/49419 and https://github.com/nodejs/node/pull/49491) to backport it so that people can at least upgrade to v20...
CI: https://ci.nodejs.org/job/node-test-pull-request/53915/
CI: https://ci.nodejs.org/job/node-test-pull-request/53920/
Checked the CITGM run results a bit:
- bluebird, dicer, body-parser, csv-parser, bcrypt, tape, ava also fail on main https://ci.nodejs.org/job/citgm-smoker/3236
- browserify, bufferutil, semver, tape, bcrypt, full-icu-test, ws: these pass for me locally with this PR. Also https://github.com/nodejs/citgm/pull/959 was updated a bit during the span I launched the two runs (which was why I had to abort two builds in the middle because that PR had some merge conflict markers in the JSON file for a while) so it could be related to the updates. From a look into the failures, they are either connection issues or timeouts and unlikely to be related to this PR.
So my interpretation is that the CITGM run (with the trimmed lookup file from https://github.com/nodejs/citgm/pull/959 at least) looks clean.
Could you attempt to run Jest as well? If you ran the "bankruptcy PR", then Jest is (currently) removed. Jest heavily utilizes vm.Script
(including importModuleDynamically
) and also supports native ESM via SourceTextModule
.
Used compileFunction
before, but had to migrate back to Script
due to performance issues (I don't know if this PR affects https://github.com/nodejs/node/issues/35375 at all?).
EDIT: If you have a compiled binary for arm64 Macs, I'm happy to test it myself 🙂
Could you attempt to run Jest as well?
I ran citgm jest locally and the failures I got are identical to the ones I got from the main branch. So I interpret it as no regressions.
I don't know if this PR affects https://github.com/nodejs/node/issues/35375 at all?
I don't think it does, the underlying issues (V8 not handling code caching + host defined options) are still unfixed in the upstream and they are orthogonal to this PR (this does not affect code caching in any way).
If you have a compiled binary for arm64 Macs, I'm happy to test it myself
I uploaded a binary to my fork: https://github.com/joyeecheung/node/releases/tag/fix-compile-memory - I didn't sign this (not sure how to sign it properly for distribution without doing red tapes) but I think you could make it runnable by doing chmod +x path/to/binary
, then try to run it for the first time, and when macOS denies it, go to the "Privacy & Security" system setting, there should be an area asking for permission to run this particular binary, and you allow it - the next time you run it there should be an "open" option available and that'll make it work.
CI: https://ci.nodejs.org/job/node-test-pull-request/53931/
ran make format-cpp to make the GitHub CI happy..
The https://github.com/nodejs/node/labels/notable-change label has been added by @joyeecheung.
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment.
CI: https://ci.nodejs.org/job/node-test-pull-request/53934/
cc @jasnell @addaleax @legendecas who originally reviewed https://github.com/nodejs/node/pull/46785 in hope to get some more eyeballs to double check it's correct this time around before I land this...but I'll try to land this within this week if I don't any more reviews, as a backport of this to v20.x might be worth it even if it breaks the ABI (I think we can just increment the ABI version exceptionally at this point, but it'll get complicated when v21.x goes out next month or when v20.x goes LTS, so I do not want to hold it off for too long).
it'll get complicated when v21.x goes out next month or when v20.x goes LTS
Is there any precedent for slightly pushing those timelines back to get this fix in? I'd hate to see this miss the cut because of an arbitrary deadline (particularly for a v20 backport).
If you have a compiled binary for arm64 Macs, I'm happy to test it myself
I uploaded a binary to my fork: joyeecheung/node@
fix-compile-memory
(release) - I didn't sign this (not sure how to sign it properly for distribution without doing red tapes) but I think you could make it runnable by doingchmod +x path/to/binary
, then try to run it for the first time, and when macOS denies it, go to the "Privacy & Security" system setting, there should be an area asking for permission to run this particular binary, and you allow it - the next time you run it there should be an "open" option available and that'll make it work.
Can confirm the test suite passes (with the expected punycode
errors) 👍
@SimenB did you happen to notice any performance improvements in Jest at all?
Didn't look for it (or notice anything), but from my understanding this PR is about memory leaks (and avoiding segfaults), not performance.
$ hyperfine '~/Downloads/node packages/jest-cli/bin/jest.js packages/' 'node packages/jest/bin/jest.js packages/' --warmup 2 -i
Benchmark 1: ~/Downloads/node packages/jest-cli/bin/jest.js packages/
Time (mean ± σ): 15.894 s ± 0.068 s [User: 82.691 s, System: 11.940 s]
Range (min … max): 15.807 s … 16.003 s 10 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: node packages/jest/bin/jest.js packages/
Time (mean ± σ): 15.987 s ± 0.086 s [User: 84.379 s, System: 11.295 s]
Range (min … max): 15.863 s … 16.153 s 10 runs
Warning: Ignoring non-zero exit code.
Summary
~/Downloads/node packages/jest-cli/bin/jest.js packages/ ran
1.01 ± 0.01 times faster than node packages/jest/bin/jest.js packages/
No real difference (and the global node
is Node v20)
@SimenB sorry, I made the wrong question. I was actually interested in knowing if this could eventually help with the issues reported in the Jest's issue #1967 :)
@joyeecheung this PR fixes the segfault from the reproduction in https://github.com/nodejs/node/issues/35889#issuecomment-751125298 (the first one using Jest - I cannot get the plain Node one to segfault)
EDIT: Also the reproduction in the OP, so #35889 can be marked as fixed by this PR 🙂
🎉
Might be more from #37648 that's fixed by this - I just tested the one I linked first.