node icon indicating copy to clipboard operation
node copied to clipboard

fs: improve `ExistsSync` performance on Windows

Open anonrig opened this issue 1 year ago • 15 comments

Simplifies code by using std::filesystem. Since std::filesystem is C++17, this pull-request can be backported to 18 and 20.

The following benchmark shows 2% improvement, but I'm sure if we had a Windows benchmark-ci machine it would have better results. Especially for symlinks, we are now not executing 2 different operating-system calls, and only one.

FYI: std::filesystem::exists follows symlinks.

Benchmark code:

const fs = require('node:fs');
const exists = fs.existsSync(__filename);
console.log(exists);

Benchmark result:

PS C:\Users\yagiz\Desktop\coding\node> hyperfine ".\out\Release\node .\exists-bench.js" " .\node-pr .\exists-bench.js" --warmup 10
Benchmark 1: .\out\Release\node .\exists-bench.js
  Time (mean ± σ):      48.8 ms ±   2.9 ms    [User: 2.2 ms, System: 3.3 ms]
  Range (min … max):    41.6 ms …  55.3 ms    56 runs

Benchmark 2:  .\node-pr .\exists-bench.js
  Time (mean ± σ):      47.7 ms ±   2.0 ms    [User: 1.3 ms, System: 2.8 ms]
  Range (min … max):    42.2 ms …  50.5 ms    60 runs

Summary
   .\node-pr .\exists-bench.js ran
    1.02 ± 0.07 times faster than .\out\Release\node .\exists-bench.js

cc @nodejs/fs @nodejs/platform-windows

anonrig avatar Jun 21 '24 20:06 anonrig

Have you run the benchmarks under the benchmark folder? It’s not very reliable to benchmark a single fs call within a script using hyperfine since Node.js itself can take a lot more time to start up than doing that one fs call. (Perhaps around 40ms of that 48-47ms can be Node.js startup)

I agree. Unfortunately, I couldn't figure out some weird npm issue on my Windows machine, and couldn't make a progress due to the slowness of the benchmarks (even if I decreased the n).

anonrig avatar Jun 21 '24 20:06 anonrig

I've opened an issue to keep track of the Windows issue I'm facing: https://github.com/nodejs/node/issues/53538

anonrig avatar Jun 21 '24 20:06 anonrig

CI: https://ci.nodejs.org/job/node-test-pull-request/59919/

nodejs-github-bot avatar Jun 21 '24 22:06 nodejs-github-bot

I agree. Unfortunately, I couldn't figure out some weird npm issue on my Windows machine, and couldn't make a progress due to the slowness of the benchmarks (even if I decreased the n).

Why would you need npm access to run the benchmarks? I think they can simply be run as

out/Release/node benchmark/compare.js --old ./node_main --new out/Release/node --filter existsSync fs > fs.csv

?

joyeecheung avatar Jun 22 '24 18:06 joyeecheung

Oh I see in https://github.com/nodejs/node/issues/53538 you were trying to run node-benchmark-compare. You could just send that over to any machine that can run node-benchmark-compare to analyze the data, the csv is portable.

(Or you could run the R version with data.csv | Rscript benchmark/compare.R though I imagine getting Rscript to work on Windows can also be quite a task....or maybe not?)

joyeecheung avatar Jun 22 '24 18:06 joyeecheung

Microbenchmark CI https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1571/

joyeecheung avatar Jun 22 '24 18:06 joyeecheung

Microbenchmark CI https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1571/

It seems, this PR is 15% faster

anonrig avatar Jun 23 '24 00:06 anonrig

It seems, this PR is 15% faster

That only applies to the non-existing part. The question for me is what is more likely: existing or non-existing?

                                                          confidence improvement accuracy (*)   (**)  (***)
fs/bench-existsSync.js n=1000000 type='existing'                 ***     -0.92 %       ±0.29% ±0.39% ±0.51%
fs/bench-existsSync.js n=1000000 type='non-existing'             ***     15.26 %       ±1.16% ±1.55% ±2.04%
fs/bench-existsSync.js n=1000000 type='non-flat-existing'        ***      2.50 %       ±0.39% ±0.53% ±0.69%

BridgeAR avatar Jun 24 '24 07:06 BridgeAR

That only applies to the non-existing part. The question for me is what is more likely: existing or non-existing?

@BridgeAR for unix, the difference is 15% but on Windows, but on windows, it reduces the number of OS calls.

anonrig avatar Jun 24 '24 16:06 anonrig

CI: https://ci.nodejs.org/job/node-test-pull-request/59952/

nodejs-github-bot avatar Jun 24 '24 16:06 nodejs-github-bot

Did you test this hypothesis, or is it guaranteed by std::filesystem::exists?

I was referencing the cppreference (https://en.cppreference.com/w/cpp/filesystem/exists)

anonrig avatar Jun 24 '24 16:06 anonrig

Did you test this hypothesis, or is it guaranteed by std::filesystem::exists?

I was referencing the cppreference (https://en.cppreference.com/w/cpp/filesystem/exists)

I don't see anything in that reference that would guarantee that the standard library function makes a single system call (or the Windows equivalent) and not multiple system calls.

tniessen avatar Jun 24 '24 17:06 tniessen

CI: https://ci.nodejs.org/job/node-test-pull-request/59969/

nodejs-github-bot avatar Jun 25 '24 23:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/59984/

nodejs-github-bot avatar Jun 26 '24 19:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/59985/

nodejs-github-bot avatar Jun 26 '24 19:06 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/60177/

nodejs-github-bot avatar Jul 08 '24 18:07 nodejs-github-bot