node
node copied to clipboard
fs: move `ToNamespacedPath` to c++
This is a step forward to move all node:fs implementation to C++. Since we now have support for win32.pathResolve() in C++, we can move forward with moving ToNamespacedPath functions to C++.
In a follow up pull-request, I'll move FromNamespacedPath() from node_url.cc to path.cc
cc @nodejs/fs @RafaelGSS
Review requested:
- [ ] @nodejs/url
cc @nodejs/platform-windows Any idea where this error/warning is thrown from?
AFAICT the changes are implying that getValidatedPath() should do path.ToNamespacedPath() internally, but there's no changes to getValidatedPath() itself in this PR. Does it happen implicitly? If yes, on which step?
AFAICT the changes are implying that
getValidatedPath()should dopath.ToNamespacedPath()internally, but there's no changes togetValidatedPath()itself in this PR. Does it happen implicitly? If yes, on which step?
getValidatedPath() result is called with toNamespacedPath() to ensure Windows long paths are handled correctly. I didn't quite understand your comment, but this pull request basically replaces toNamespacedPath(getValidatedPath(input)) with getValidatedPath(input) and calls toNamespacedPath from C++
getValidatedPath() result is called with
toNamespacedPath()to ensure Windows long paths are handled correctly. I didn't quite understand your comment, but this pull request basically replacestoNamespacedPath(getValidatedPath(input))withgetValidatedPath(input)and callstoNamespacedPathfrom C++
Sorry, I read the changes wrong. It doesn't affect js-side toNamespacedPath() itself but adds cpp-side version that is used in cpp implementations for each fs method.
We might need to doublecheck that every method uses this. Of course, it's non-blocking concern: I'm just pointing out that after this change we'll have to check cpp side instead of js.
Right now I'm only seeing that Mkdtemp lacks it here: https://github.com/nodejs/node/blob/d93f610046d589e9b081935205b82a3a4ab97089/src/node_file.cc#L2820-L2823
Which mirrors current (might be bugged on Windows if called with long prefix) js implementation.
CI: https://ci.nodejs.org/job/node-test-pull-request/57821/
This should land first https://github.com/nodejs/node/pull/52148.
@RafaelGSS the function signature is not 1 to 1. We're always using BufferValue in node:fs.
I think this is different from https://github.com/nodejs/node/pull/51295 in that, the other PR only used the C++ version in a not very well-tested branch (only when permission is enabled) so if it's broken, we may not be able to know due to the low test coverage. This PR however changed most fs bindings so if it's broken, it's probably more likely that some tests would fail on Windows.
That said I think we already don't have enough test cases to check the kind of paths that are need to be handled specially by toNamespacedPath(). Perhaps it would be useful to check the cases in https://github.com/nodejs/node/pull/51097 too
CI: https://ci.nodejs.org/job/node-test-pull-request/57876/
@anonrig I did not investigate deeply. Running tests locally under Windows. I see two 'simple' test failures.
- This one:
=== release test-permission-fs-read ===
Path: parallel/test-permission-fs-read
AssertionError [ERR_ASSERTION]: Expected values to be strictly deep-equal:
+ actual - expected
Comparison {
code: 'ERR_ACCESS_DENIED',
permission: 'FileSystemRead',
+ resource: 'C:\\msys64\\home\\Daniel\\CVS\\github\\node\\test\\fixtures\\permission\\deny\\protected-file.md'
- resource: '\\\\?\\C:\\msys64\\home\\Daniel\\CVS\\github\\node\\test\\fixtures\\permission\\deny\\protected-file.md'
}
at Object.<anonymous> (C:\msys64\home\Daniel\CVS\github\node\test\common\index.js:725:12)
at Object.<anonymous> (C:\msys64\home\Daniel\CVS\github\node\test\common\index.js:473:15)
at expectedException (node:assert:733:17)
at expectsError (node:assert:856:3)
at Function.throws (node:assert:911:3)
at Object.<anonymous> (C:\msys64\home\Daniel\CVS\github\node\test\fixtures\permission\fs-read.js:17:10)
at Module._compile (node:internal/modules/cjs/loader:1421:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1499:10)
at Module.load (node:internal/modules/cjs/loader:1232:32)
at Module._load (node:internal/modules/cjs/loader:1048:12) {
generatedMessage: true,
code: 'ERR_ASSERTION',
And also
=== release test-child-process-cwd ===
Path: parallel/test-child-process-cwd
node:assert:126
throw new AssertionError(obj);
^
AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected
+ 'C:\\Windows'
- 'C:\\WINDOWS'
^
at ChildProcess.<anonymous> (C:\msys64\home\Daniel\CVS\github\node\test\parallel\test-child-process-cwd.js:55:26)
at ChildProcess.<anonymous> (C:\msys64\home\Daniel\CVS\github\node\test\common\index.js:473:15)
at ChildProcess.emit (node:events:520:28)
at maybeClose (node:internal/child_process:1105:16)
at Socket.<anonymous> (node:internal/child_process:457:11)
at Socket.emit (node:events:520:28)
at Pipe.<anonymous> (node:net:338:12) {
generatedMessage: true,
code: 'ERR_ASSERTION',
actual: 'C:\\Windows',
expected: 'C:\\WINDOWS',
operator: 'strictEqual'
}
It might be useful to start by those test failures as they do not seem too complicated.
Ok. I have spent some time on the issue but with little progress, unfortunately. But my instinct is telling me that this PR is essentially correct and that the failures have to do with corner cases.
For the test test\parallel\test-permission-fs-read.js, the issue appears to be in the the exception/error being reported. The test expects a namespaced error report, but the code seems to just report the path passed as a parameter to the node command. I am not sure what this test checks and why it is important that the report takes a specific form. It is somewhat inconvenient to reproduce cleanly, and I don't know where the exception is thrown. Can someone point me where it is thrown? It is possible that this PR is missing a ToNamespacedPath at that location.
There are a few cases where, in C++, you throw THROW_IF_INSUFFICIENT_PERMISSIONS without necessarily doing a ToNamespacedPath right before. E.g., InternalModuleStat.
The test\parallel\test-child-process-cwd.js test is interesting. We set the working directory to C:\WINDOWS then get back (on my machine) the correct result C:\Windows but the test expects C:\WINDOWS. The test is somewhat abstract and I am not certain that I know what it is doing and what it is trying to test.
Under Windows, both C:\WINDOWS and C:\Windows refer to the same directory...
> path.relative("C:\\Windows", "C:\\WINDOWS")
''
So requiring that the string be exact is not what I'd expect when comparing paths referencing directories under Windows.
The strict equality in the cwd test was introduced by @wzoom
We can question what the tests are doing, but path is a very sensitive module (especially security-wise) so I'm actually glad that they caught what could be regressions introduced by the C++ version.
We can question what the tests are doing, but path is a very sensitive module (especially security-wise) so I'm actually glad that they caught what could be regressions introduced by the C++ version.
Regarding C:\\WINDOWS vs C:\\Windows, I reported local failures that may or may not appear in the CI (I haven't checked).
To debug this, it would be helpful to understand the test.
Here is how it works under macOS:
> mkdir BogusRepo
> cd bogusrepo
> pwd
/bogusrepo
> node
Welcome to Node.js v21.5.0.
Type ".help" for more information.
> process.cwd()
'/BogusRepo'
Let us add a directory
> mkdir OhOh
> node
Welcome to Node.js v21.5.0.
Type ".help" for more information.
> process.chdir("ohoh")
undefined
> process.cwd()
'/Users/dlemire/CVS/tmp/BogusRepo/OhOh'
The same DO NOT works under Windows. If I go to C:\\WINDOWS and ask node (either with a release or this PR) what process.cwd() is, it gives me C:\\WINDOWS. However, 'cd' behaves differently from node.
So I am asking about test\parallel\test-child-process-cwd.js. What does this test verifies?
Let me be clearer: I do not want to dismiss the test. I am saying that it checks for something I do not expect. Why do we need exact string matching?
This is the C:\WINDOWS failing test:
https://github.com/nodejs/node/blob/fc029181df676f97b9e329639760f87722cbcece/test/parallel/test-child-process-cwd.js#L4
What the test-child-process-cwd.js test does is to call this in a spawned process...
const pwdCommand = isWindows ?
['cmd.exe', ['/d', '/c', 'cd']] :
['pwd', []];
It passes process.env.windir along with it under Windows. At least on my laptop, process.env.windir returns C:\\WINDOWS (at least with the PR) but the real path is C:\\Windows.
(Definition: By 'real path', I mean what 'cd' reports.)
@anonrig I pushed a small change to your branch which might help.
CI: https://ci.nodejs.org/job/node-test-pull-request/57947/
CI: https://ci.nodejs.org/job/node-test-pull-request/57951/
The path test is incorrect. When you cd to a directory, successive calls to cd may not return the actual path with the same case.
Screenshot proof:
The test may not fail in CI because it may succeed by luck, but it remains an incorrect/misleading test. I am quite certain.
CI: https://ci.nodejs.org/job/node-test-pull-request/57963/
CI: https://ci.nodejs.org/job/node-test-pull-request/57965/
CI: https://ci.nodejs.org/job/node-test-pull-request/57973/
So, I ran some benchmarks on this PR as I believe the real intention to move fs to C++ is to improve performance somehow. Looks like this PR doesn't improve performance (due to high variance I can't say it decreases performance though). But here's the results:
toNamespacedPath
$ node benchmark/compare.js --old ./node --new ./node2 --filter toNamespacedPath path | Rscript benchmark/compare.R
[00:00:47|% 100| 2/2 files | 60/60 runs | 8/8 configs]: Done
confidence improvement accuracy (*) (**) (***)
path/toNamespacedPath-posix.js n=100000 path='' *** -4.22 % ±1.99% ±2.66% ±3.46%
path/toNamespacedPath-posix.js n=100000 path='.' * -1.93 % ±1.52% ±2.03% ±2.66%
path/toNamespacedPath-posix.js n=100000 path='/Users/rafaelgss/repos/os/node/benchmark/path/..' ** -2.27 % ±1.66% ±2.21% ±2.87%
path/toNamespacedPath-posix.js n=100000 path='/home/node/..' *** -3.72 % ±1.63% ±2.17% ±2.83%
path/toNamespacedPath-posix.js n=100000 path='/tmp/bar' * -2.67 % ±2.06% ±2.74% ±3.57%
path/toNamespacedPath-posix.js n=100000 path='bar/baz' ** -3.20 % ±1.92% ±2.56% ±3.33%
path/toNamespacedPath-win32.js n=100000 path='' -0.75 % ±4.06% ±5.42% ±7.08%
path/toNamespacedPath-win32.js n=100000 path='/Users/rafaelgss/repos/os/node/benchmark/path/..' -0.09 % ±0.39% ±0.52% ±0.67%
path/toNamespacedPath-win32.js n=100000 path='\\\\e.exe' -0.14 % ±2.04% ±2.73% ±3.58%
path/toNamespacedPath-win32.js n=100000 path='c:../a' -0.10 % ±0.99% ±1.31% ±1.71%
path/toNamespacedPath-win32.js n=100000 path='c:/blah\\\\blah' -0.27 % ±1.25% ±1.66% ±2.16%
path/toNamespacedPath-win32.js n=100000 path='c:/ignore' 0.50 % ±0.90% ±1.20% ±1.58%
path/toNamespacedPath-win32.js n=100000 path='d:/games' 0.50 % ±0.58% ±0.78% ±1.01%
path/toNamespacedPath-win32.js n=100000 path='d:\\\\a/b\\\\c/d' -0.53 % ±0.88% ±1.18% ±1.55%
Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 14 comparisons, you can thus
expect the following amount of false-positive results:
0.70 false positives, when considering a 5% risk acceptance (*, **, ***),
0.14 false positives, when considering a 1% risk acceptance (**, ***),
0.01 false positives, when considering a 0.1% risk acceptance (***)
fs --filter=bench-read
$ node benchmark/compare.js --old ./node --new ./node2 --filter bench-read fs | Rscript benchmark/compare.R
[00:01:10|% 100| 5/5 files | 60/60 runs | 2/2 configs]: Done
confidence improvement accuracy (*) (**) (***)
fs/bench-readSync.js n=10000 type='existing' -0.49 % ±2.37% ±3.15% ±4.10%
fs/bench-readSync.js n=10000 type='non-existing' 0.98 % ±2.60% ±3.49% ±4.58%
fs/bench-readdir.js withFileTypes='false' dir='lib' n=10 0.39 % ±2.43% ±3.24% ±4.22%
fs/bench-readdir.js withFileTypes='false' dir='test/parallel' n=10 -0.10 % ±0.34% ±0.45% ±0.59%
fs/bench-readdir.js withFileTypes='true' dir='lib' n=10 -0.86 % ±1.86% ±2.47% ±3.22%
fs/bench-readdir.js withFileTypes='true' dir='test/parallel' n=10 -0.03 % ±0.40% ±0.53% ±0.69%
fs/bench-readdirSync.js withFileTypes='false' dir='lib' n=10 1.68 % ±2.15% ±2.89% ±3.83%
fs/bench-readdirSync.js withFileTypes='false' dir='test/parallel' n=10 -0.17 % ±0.23% ±0.30% ±0.39%
fs/bench-readdirSync.js withFileTypes='true' dir='lib' n=10 -0.03 % ±0.97% ±1.30% ±1.69%
fs/bench-readdirSync.js withFileTypes='true' dir='test/parallel' n=10 -0.23 % ±0.47% ±0.63% ±0.81%
fs/bench-readlinkSync.js n=1000 type='invalid' -7.06 % ±8.81% ±11.75% ±15.35%
fs/bench-readlinkSync.js n=1000 type='valid' 2.62 % ±9.90% ±13.18% ±17.15%
fs/bench-readvSync.js n=100000 type='invalid' -0.30 % ±0.91% ±1.21% ±1.58%
fs/bench-readvSync.js n=100000 type='valid' 0.84 % ±1.45% ±1.93% ±2.51%
Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 14 comparisons, you can thus
expect the following amount of false-positive results:
0.70 false positives, when considering a 5% risk acceptance (*, **, ***),
0.14 false positives, when considering a 1% risk acceptance (**, ***),
0.01 false positives, when considering a 0.1% risk acceptance (***)
It's also important to mention I no longer have access to a dedicated machine to perform those benchmarks, so I have used my local MacOS M2 machine. I have created a benchmark specific to path.toNamespacedPath as my initial thought was the impact of the JS -> C++ layer for users of this API (https://github.com/nodejs/node/pull/52236).
The benchmark used ff7b27faf3540f9e8d73bf66635d2cffedd9d341 (This PR) for node2 and 85830da4fd814404ba490232430dc05d14868124 for node (benchmark PR).
Maybe we should re-evaluate this movement of moving things to C++ claiming for performance (if that's the case).
This move is required so that we can move getValidatedPath function. For macOS this function is a noop, for Windows it does call some complex functions.
Unless we move getValidatedPath and toNamespacedPath functions to C++ in the same call, it's normal to not see any improvement on macOS
@RafaelGSS From first principles, I would expect this PR to have little impact on performance. As @anonrig states, we are moving a trivial function (everywhere except under Windows) to C++.
Once it is done right, there is no further downside, however. E.g., it should not impact maintainability.
It is not like any JavaScript hacker really loved ToNamedspacePath.
This move is required so that we can move getValidatedPath function. For macOS this function is a noop, for Windows it does call some complex functions.
Unless we move getValidatedPath and toNamespacedPath functions to C++ in the same call, it's normal to not see any improvement on macOS
Note that I'm using path.posix. and path.win32. on the benchmarks, so I expect my operating system to not be relevant to this particular benchmark (despite of process.cwd() of course), am I missing something?
UPDATE: Actually, it seems the current API of path.toNamespacedPath will not call the C++ function, so we are basically handling two functions that behave similarly. I think there's no problem with that then, it can just be a problem if we attempt to change the API to call this new C++ function. My benchmark for this case is irrelevant.
CI: https://ci.nodejs.org/job/node-test-pull-request/57979/
@nodejs/build Why does this pull-request doesn't trigger request-ci? Any idea?