node src,lib: reducing C++ calls of esm legacy main resolve

This PR is related to https://github.com/nodejs/performance/issues/73

Legacy Main Resolve

@anonrig raised that we could benefit from the implementation of legacyMainResolve on the C++ side, so here's the result so far:

                                                                                                                                                                confidence improvement accuracy (*)
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000              ***    190.93 %       ±4.55%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                        ***    188.08 %       ±2.97%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000          ***    191.18 %       ±4.22%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                    ***    190.32 %       ±2.34%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000        ***    189.64 %       ±3.61%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                  ***    186.56 %       ±3.51%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000        ***    191.34 %       ±3.79%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                  ***    188.11 %       ±2.44%
                                                                                                                                                                  (**)  (***)
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000       ±6.19% ±8.36%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                 ±4.04% ±5.44%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000   ±5.75% ±7.77%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000             ±3.16% ±4.21%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000 ±4.92% ±6.63%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000           ±4.77% ±6.41%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000 ±5.17% ±6.99%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000           ±3.31% ±4.43%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 8 comparisons, you can thus
expect the following amount of false-positive results:
  0.40 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.08 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

To be able to run the benchmark, first build the old node using this branch.

About the benchmark, I needed to create another benchmark just for this function to be able to see the improvements since the results of esm-defaultResolver.js was not reflecting the real impact of this change.

Potential optimizations

legacyMainResolve also calls fileURLToPath and didn't have any version on the C++ side, so I needed to rewrite this function.

I haven't benchmarked the JS version yet, but I think we can add some benchmarks and see if we can replace the JS version with the C++ version, I've tried to make the code as easy as possible to support exposing on the JS side in the future.

Tasks

[x] Rewrite legacyMainResolve
[x] Rewrite fileURLToPath
[x] Add benchmark for legacyMainResolve
[x] Add tests for legacyMainResolve

Acknowledgments

Huge thanks to @anonrig for the support and early review of this code and @RafaelGSS for the hints about the benchmark and permission model.

Jun 03 '23 19:06 H4ad

Review requested:

[ ] @nodejs/loaders
[ ] @nodejs/modules

Jun 03 '23 19:06 nodejs-github-bot

legacyMainResolve also calls fileURLToPath and didn't have any version on the C++ side, so I needed to rewrite this function.

Maybe we should first test/benchmark and land the fileURLToPath rewrite, which would simplify this PR. wdyt?

Jun 04 '23 08:06 aduh95

@aduh95 Here's the benchmark for the rewrite of fileURLToPath:

IMPORTANT: Those values are not related to this PR, those numbers are only specific to the branch linked above the benchmark.

                                                                                         confidence improvement accuracy (*)    (**)   (***)
url/file-url-to-path.js n=100000 url='(url) file:///home/user/test/index.js'                    ***    -95.88 %       ±3.25%  ±4.44%  ±6.03%
url/file-url-to-path.js n=100000 url='(url) file:///home/user/test%20index.js'                  ***    -95.19 %       ±2.98%  ±4.07%  ±5.52%
url/file-url-to-path.js n=100000 url='(url) file:///home/user/test%2Findex.js'                  ***      8.66 %       ±2.71%  ±3.67%  ±4.90%
url/file-url-to-path.js n=100000 url='(url) file://google.com/home/user/test%2Findex.js'                 0.68 %       ±2.75%  ±3.72%  ±4.99%
url/file-url-to-path.js n=100000 url='(url) http://google.com/home/test.js'                     ***    -11.02 %       ±2.27%  ±3.05%  ±4.03%
url/file-url-to-path.js n=100000 url='file:///home/user/test/index.js'                          ***     97.59 %       ±7.74% ±10.45% ±13.92%
url/file-url-to-path.js n=100000 url='file:///home/user/test%20index.js'                        ***     86.08 %       ±7.51% ±10.15% ±13.52%
url/file-url-to-path.js n=100000 url='file:///home/user/test%2Findex.js'                        ***    195.47 %       ±3.95%  ±5.35%  ±7.14%
url/file-url-to-path.js n=100000 url='file://google.com/home/user/test%2Findex.js'              ***    166.80 %       ±3.19%  ±4.29%  ±5.66%
url/file-url-to-path.js n=100000 url='http://google.com/home/test.js'                           ***    145.18 %       ±4.26%  ±5.76%  ±7.68%
url/file-url-to-path.js n=100000 url='not-even-a-url'                                           ***    168.09 %       ±7.25%  ±9.87% ±13.32%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 11 comparisons, you can thus
expect the following amount of false-positive results:
  0.55 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.11 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

Branch: https://github.com/nodejs/node/compare/main...h4ad-forks:node:perf/file-url-to-path

In resume, if we want to expose the C++ function, we should enforce the caller to always pass a string instead of URL.

Jun 04 '23 17:06 H4ad

@H4ad I meant to have it as a separate PR.

Jun 04 '23 23:06 aduh95

@aduh95 Yeah, my idea was to show you if you really want to do that with the performance benefits from it.

To be honest, in this form, without refactoring the whole ESM to avoid new URL and manipulate the data directly as a string, we will not see a performance improvement, it will be the complete opposite.

I think is better to go in the opposite direction, merge this one and then evaluate the benefits of exposing fileURLToPath, because as I said, will probably lead to a refactoring not only in the function itself but also in the places where they call to take the fastest path, which is manipulating directly the string.

Jun 04 '23 23:06 H4ad

url/file-url-to-path.js n=100000 url=’(url) file:///home/user/test/index.js’ *** -95.88 % ±3.25% ±4.44% ±6.03%

This seems to show that in the most common case of a file:// URL, this is 96% slower?

Jun 05 '23 21:06 GeoffreyBooth

@GeoffreyBooth Exactly, the way the fileURLToPath is used today, simply don't worth exposing the C++ version.

Jun 06 '23 00:06 H4ad

@GeoffreyBooth Exactly, the way the fileURLToPath is used today, simply don’t worth exposing the C++ version.

Okay, so then should we close this PR? Is there something else of value being proposed here?

Jun 06 '23 00:06 GeoffreyBooth

This PR is not about fileURLToPath, is about LegacyMainResolve.

The improvements for LegacyMainResolve is about 60~70%.

The rewrite of fileURLToPath was only because it was needed by LegacyMainResolve. Aduh suggested that maybe we could create a PR for fileURLToPath but as we've seen, it's not worth it, that's it.

Also, the metrics of https://github.com/nodejs/node/pull/48325#issuecomment-1575649208 is not related to this PR itself, are related to the branch I link in the comment, so those numbers don't reflect this PR since we don't have any overhead of calling C++ function from C++ side.

Jun 06 '23 00:06 H4ad

The CI for Test Asan is failing due some memory issue:

==88318==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd9ad085f8 at pc 0x00000431f995 bp 0x7ffd9ad07570 sp 0x7ffd9ad07568
READ of size 16 at 0x7ffd9ad085f8 thread T0
    #0 0x431f994 in ada::url_aggregator ada::parser::parse_url<ada::url_aggregator>(std::basic_string_view<char, std::char_traits<char> >, ada::url_aggregator const*) (/home/runner/work/node/node/out/Release/node+0x431f994)
    #1 0x4312e13 in tl::expected<ada::url_aggregator, ada::errors> ada::parse<ada::url_aggregator>(std::basic_string_view<char, std::char_traits<char> >, ada::url_aggregator const*) (/home/runner/work/node/node/out/Release/node+0x4312e13)
    #2 0xfc3b97 in node::fs::LegacyMainResolve(v8::FunctionCallbackInfo<v8::Value> const&) (/home/runner/work/node/node/out/Release/node+0xfc3b97)
    #3 0x1779bab in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, unsigned long*, int) (/home/runner/work/node/node/out/Release/node+0x1779bab)
    #4 0x17779b4 in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) (/home/runner/work/node/node/out/Release/node+0x17779b4)
    #5 0x3402e35 in Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit (/home/runner/work/node/node/out/Release/node+0x3402e35)

Address 0x7ffd9ad085f8 is located in stack of thread T0 at offset 1304 in frame
    #0 0xfc364f in node::fs::LegacyMainResolve(v8::FunctionCallbackInfo<v8::Value> const&) (/home/runner/work/node/node/out/Release/node+0xfc364f)

Ref: https://github.com/nodejs/node/actions/runs/5189308542/jobs/9354128411?pr=48325

Does anyone know what could be the cause?

I tried to reproduce it locally but I couldn't, so I have no idea how to fix it.

Jun 07 '23 19:06 H4ad

CI: https://ci.nodejs.org/job/node-test-pull-request/52176/

Jun 09 '23 11:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52261/

Jun 15 '23 03:06 nodejs-github-bot

CI is failing https://ci.nodejs.org/job/node-test-binary-armv7l/RUN_SUBSET=js,nodes=ubuntu2004-armv7l/6181/consoleFull but with an error that I think is not related with this PR:

not ok 3740 sequential/test-perf-hooks
01:20:33   ---
01:20:33   duration_ms: 618.48400
01:20:33   severity: fail
01:20:33   exitcode: 1
01:20:33   stack: |-
01:20:33     {
01:20:33       name: 'node',
01:20:33       entryType: 'node',
01:20:33       startTime: 0,
01:20:33       duration: { around: 404.7198510169983 },
01:20:33       nodeStart: { around: 0 },
01:20:33       v8Start: { around: 0 },
01:20:33       bootstrapComplete: { around: 404.6602909564972, delay: 2500 },
01:20:33       environment: { around: 0 },
01:20:33       loopStart: -1,
01:20:33       loopExit: -1
01:20:33     }
01:20:33     node:assert:399
01:20:33         throw err;
01:20:33         ^
01:20:33     
01:20:33     AssertionError [ERR_ASSERTION]: environment: 250.5410280227661 >= 250
01:20:33         at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-armv7l/test/sequential/test-perf-hooks.js:31:7)
01:20:33         at Object.<anonymous> (/home/iojs/build/workspace/node-test-binary-armv7l/test/sequential/test-perf-hooks.js:43:1)
01:20:33         at Module._compile (node:internal/modules/cjs/loader:1255:14)
01:20:33         at Module._extensions..js (node:internal/modules/cjs/loader:1309:10)
01:20:33         at Module.load (node:internal/modules/cjs/loader:1113:32)
01:20:33         at Module._load (node:internal/modules/cjs/loader:960:12)
01:20:33         at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:83:12)
01:20:33         at node:internal/main/run_main_module:23:47 {
01:20:33       generatedMessage: false,
01:20:33       code: 'ERR_ASSERTION',
01:20:33       actual: false,
01:20:33       expected: true,
01:20:33       operator: '=='
01:20:33     }
01:20:33     
01:20:33     Node.js v21.0.0-pre

Any hints about what I should do?

Jun 15 '23 10:06 H4ad

Also, I updated the benchmark, with the latest changes, the improvement is almost 200%.

But one thing that is making me a little skeptical about this number is this benchmark:

                                                                               confidence improvement accuracy (*)   (**)   (***)
esm/esm-loader-defaultResolve.js specifier='./relative-existing.js' n=10000                    4.83 %       ±6.85% ±9.20% ±12.15%
esm/esm-loader-defaultResolve.js specifier='./relative-nonexistent.js' n=10000                 1.95 %       ±4.28% ±5.74%  ±7.56%
esm/esm-loader-defaultResolve.js specifier='node:os' n=10000                                  -1.98 %       ±5.33% ±7.17%  ±9.47%
esm/esm-loader-defaultResolve.js specifier='node:prefixed-nonexistent' n=10000                -3.48 %       ±5.01% ±6.71%  ±8.83%
esm/esm-loader-defaultResolve.js specifier='unprefixed-existing' n=10000                      -0.32 %       ±4.83% ±6.47%  ±8.51%
esm/esm-loader-defaultResolve.js specifier='unprefixed-nonexistent' n=10000                    0.16 %       ±1.99% ±2.67%  ±3.52%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 6 comparisons, you can thus
expect the following amount of false-positive results:
  0.30 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.06 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

Only unprefixed-existing calls legacyMainResolve, and we don't see the performance reflected in this benchmark.

I wrote a little benchmark and we came from 36k op/s to 63k op/s, which is a real improvement, so does it worth introducing 400+ lines for this amount of improvement?

Jun 15 '23 22:06 H4ad

I added some tests to ensure the correct resolution of legacyMainResolve, this also helped me catch some edge cases related to the errors.

I just added some good paths and check about the errors, and I didn't add any test to ensure the order of the resolution (if has index.json and index.js, should return index.js first). If you think is necessary, I can add those tests too.

Jun 18 '23 15:06 H4ad

Failed to start CI

- Validating Jenkins credentials
✘  Jenkins credentials invalid

https://github.com/nodejs/node/actions/runs/5305353243

Jun 18 '23 19:06 github-actions[bot]

CI: https://ci.nodejs.org/job/node-test-pull-request/52478/

Jun 26 '23 04:06 nodejs-github-bot

@H4ad it’s really hard to keep track of what you’ve changed when you force push on a single commit everytime, and makes this PR hard to merge. I recommend using git commit fixup or just add a new commit for every fix.

Jun 27 '23 00:06 anonrig

@anonrig Sorry, I thought it was better to just keep a single commit, that's why I always push force, but I'll be using fixup for future fixes.

Jun 27 '23 01:06 H4ad

CI: https://ci.nodejs.org/job/node-test-pull-request/52522/

Jun 27 '23 18:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/52528/

Jun 27 '23 21:06 nodejs-github-bot

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1342/

Results

confidence improvement accuracy (*)
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000              ***     27.52 %       ±6.03%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                        ***     30.58 %       ±6.83%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000          ***     25.63 %       ±6.15%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                    ***     20.21 %       ±7.75%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000        ***     27.66 %       ±6.96%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                  ***     29.09 %       ±7.64%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000        ***     23.78 %       ±6.45%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                  ***     21.65 %       ±6.65%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 8 comparisons, you can thus
expect the following amount of false-positive results:
  0.40 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.08 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

Jun 28 '23 17:06 anonrig

CI: https://ci.nodejs.org/job/node-test-pull-request/52585/

Jul 03 '23 16:07 nodejs-github-bot

Landed in a2a8e31cbc441cd2b513227be09d90568aa77b5a

Jul 03 '23 18:07 nodejs-github-bot

This commit does not land cleanly on v18.x-staging and will need manual backport in case we want it in v18.

Sep 11 '23 02:09 ruyadorno

@ruyadorno Hey, can I help with something?

Sep 11 '23 12:09 H4ad

@H4ad does https://github.com/nodejs/node/blob/main/doc/contributing/backporting-to-release-lines.md#how-to-submit-a-backport-pull-request help?

Sep 11 '23 12:09 RaisinTen

@RaisinTen Thanks, I'll try to backport by the end of this day.

Sep 11 '23 12:09 H4ad

node node copied to clipboard

src,lib: reducing C++ calls of esm legacy main resolve

Legacy Main Resolve

Potential optimizations

Tasks

Acknowledgments

node
node copied to clipboard