node icon indicating copy to clipboard operation
node copied to clipboard

url: handle "unsafe" characters properly in `pathToFileURL`

Open aduh95 opened this issue 1 year ago • 21 comments

Fixes: https://github.com/nodejs/node/issues/54515

Given the number of characters to cover, I leaning towards using one regex to deal with them all rather than adding more special cases. Let's check what the benchmark says.

FWIW According to RFC1738, they should be encoded (see the bold):

Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "", "^", "~", "[", "]", and "`".

All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.

Originally posted by @RedYetiDev in https://github.com/nodejs/node/issues/54515#issuecomment-2307957099

I took the tests from https://github.com/nodejs/node/pull/54516, so I added @EarlyRiser42 as co-author.

aduh95 avatar Aug 24 '24 19:08 aduh95

Review requested:

  • [ ] @nodejs/loaders
  • [ ] @nodejs/url

nodejs-github-bot avatar Aug 24 '24 19:08 nodejs-github-bot

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 88.03%. Comparing base (be4babb) to head (dcb75fa). Report is 556 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #54545   +/-   ##
=======================================
  Coverage   88.03%   88.03%           
=======================================
  Files         652      652           
  Lines      183761   183817   +56     
  Branches    35863    35873   +10     
=======================================
+ Hits       161765   161816   +51     
- Misses      15229    15241   +12     
+ Partials     6767     6760    -7     
Files with missing lines Coverage Δ
lib/internal/process/execution.js 98.79% <100.00%> (ø)
lib/internal/url.js 97.92% <100.00%> (+0.02%) :arrow_up:

... and 24 files with indirect coverage changes

codecov[bot] avatar Aug 24 '24 21:08 codecov[bot]

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1617/

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 method='fileURLToPath' input='file:///dev/null?key=param&bool'             ***    -11.23 %       ±0.72% ±0.97% ±1.28%
url/whatwg-url-to-and-from-path.js n=5000000 method='fileURLToPath' input='file:///dev/null?key=param&bool#hash'        ***    -12.50 %       ±0.39% ±0.52% ±0.68%
url/whatwg-url-to-and-from-path.js n=5000000 method='fileURLToPath' input='file:///dev/null'                            ***     -6.74 %       ±0.44% ±0.59% ±0.76%
url/whatwg-url-to-and-from-path.js n=5000000 method='pathToFileURL' input='file:///dev/null?key=param&bool'             ***    -11.01 %       ±0.59% ±0.79% ±1.04%
url/whatwg-url-to-and-from-path.js n=5000000 method='pathToFileURL' input='file:///dev/null?key=param&bool#hash'        ***    -12.99 %       ±0.42% ±0.56% ±0.73%
url/whatwg-url-to-and-from-path.js n=5000000 method='pathToFileURL' input='file:///dev/null'                            ***     -6.52 %       ±0.38% ±0.50% ±0.66%

aduh95 avatar Aug 24 '24 22:08 aduh95

How does it have an impact on fileURLToPath ?

targos avatar Aug 25 '24 07:08 targos

There was a typo in the current benchmark causing only pathToFileURL to run. The typo fileUrlOrPath instead of fileUrlToPath in main led to this issue, even though both functions were passed as inputs in createbenchmark. I submitted a PR to fix this (#54190).

injunchoi98 avatar Aug 25 '24 08:08 injunchoi98

How does it have an impact on fileURLToPath ?

I have no idea, it's surprising that the confidence is so high for a seemingly unrelated change

aduh95 avatar Aug 25 '24 09:08 aduh95

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1618/

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***    -15.54 %       ±0.23% ±0.31% ±0.40%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***    -14.31 %       ±0.31% ±0.42% ±0.55%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***     -7.84 %       ±0.37% ±0.49% ±0.63%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'                     -0.17 %       ±0.44% ±0.58% ±0.76%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'             ***    -11.61 %       ±0.61% ±0.81% ±1.06%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                -0.35 %       ±0.40% ±0.54% ±0.70%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'        ***    -14.09 %       ±0.49% ±0.66% ±0.86%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                     0.02 %       ±0.40% ±0.54% ±0.70%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***     -7.37 %       ±0.45% ±0.60% ±0.78%

aduh95 avatar Aug 25 '24 18:08 aduh95

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1619/

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***    -11.39 %       ±0.32% ±0.43% ±0.56%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***     -7.01 %       ±0.40% ±0.53% ±0.69%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***     -7.76 %       ±0.31% ±0.42% ±0.55%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'                     -0.09 %       ±0.27% ±0.36% ±0.47%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'             ***     -8.94 %       ±0.73% ±0.98% ±1.30%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                -0.18 %       ±0.35% ±0.47% ±0.61%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'        ***     -9.00 %       ±0.43% ±0.57% ±0.75%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                    -0.33 %       ±1.10% ±1.47% ±1.94%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***     -6.85 %       ±0.42% ±0.56% ±0.74%

aduh95 avatar Aug 25 '24 22:08 aduh95

V8 performance lesson of the day: multiple regexes are better than a single one apparently 🤷‍♂️

aduh95 avatar Aug 25 '24 23:08 aduh95

note: using multiple regexes shows the local benchmark:

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***     -3.96 %       ±1.08% ±1.44% ±1.87%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***     -3.65 %       ±1.14% ±1.53% ±2.00%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***     -6.57 %       ±1.02% ±1.36% ±1.77%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'                      0.25 %       ±1.02% ±1.35% ±1.76%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'               *     -1.93 %       ±1.89% ±2.51% ±3.27%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                 0.32 %       ±1.03% ±1.37% ±1.78%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'         **     -2.23 %       ±1.55% ±2.07% ±2.71%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                     0.62 %       ±1.71% ±2.29% ±2.98%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***     -5.77 %       ±2.56% ±3.44% ±4.56%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 9 comparisons, you can thus expect the following amount of false-positive results:
  0.45 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.09 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

injunchoi98 avatar Aug 26 '24 08:08 injunchoi98

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1620/

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***     -8.38 %       ±0.43% ±0.58% ±0.75%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***     -9.95 %       ±0.50% ±0.66% ±0.87%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***    -15.48 %       ±0.52% ±0.70% ±0.91%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'                      0.35 %       ±0.37% ±0.50% ±0.65%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'             ***     -7.29 %       ±0.49% ±0.66% ±0.86%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                 0.50 %       ±0.81% ±1.09% ±1.43%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'        ***    -10.97 %       ±0.36% ±0.48% ±0.62%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                     0.11 %       ±0.69% ±0.92% ±1.19%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***     -9.62 %       ±0.42% ±0.55% ±0.72%

aduh95 avatar Aug 26 '24 22:08 aduh95

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1621/

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***     -9.09 %       ±0.39% ±0.53% ±0.69%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***    -10.66 %       ±0.45% ±0.60% ±0.78%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***    -15.32 %       ±0.29% ±0.39% ±0.51%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'                     -0.06 %       ±0.43% ±0.57% ±0.74%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'             ***     -8.11 %       ±0.67% ±0.91% ±1.20%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                -0.07 %       ±0.46% ±0.61% ±0.79%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'        ***    -10.64 %       ±0.52% ±0.69% ±0.91%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                     0.69 %       ±0.82% ±1.10% ±1.44%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***     -7.96 %       ±0.31% ±0.41% ±0.54%

aduh95 avatar Aug 26 '24 23:08 aduh95


confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***     -9.09 %       ±0.39% ±0.53% ±0.69%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***    -10.66 %       ±0.45% ±0.60% ±0.78%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***    -15.32 %       ±0.29% ±0.39% ±0.51%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'                     -0.06 %       ±0.43% ±0.57% ±0.74%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'             ***     -8.11 %       ±0.67% ±0.91% ±1.20%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                -0.07 %       ±0.46% ±0.61% ±0.79%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'        ***    -10.64 %       ±0.52% ±0.69% ±0.91%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                     0.69 %       ±0.82% ±1.10% ±1.44%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***     -7.96 %       ±0.31% ±0.41% ±0.54%

avivkeller avatar Aug 27 '24 01:08 avivkeller

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1622/

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***    -22.93 %       ±0.46% ±0.62% ±0.82%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***    -24.39 %       ±0.29% ±0.39% ±0.51%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***    -33.14 %       ±0.40% ±0.54% ±0.70%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'                     -0.51 %       ±0.86% ±1.16% ±1.54%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'             ***    -18.79 %       ±0.51% ±0.68% ±0.89%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                 0.19 %       ±0.30% ±0.40% ±0.52%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'        ***    -22.45 %       ±0.39% ±0.52% ±0.68%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                     0.11 %       ±0.38% ±0.50% ±0.65%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***    -22.06 %       ±0.46% ±0.62% ±0.80%

aduh95 avatar Aug 27 '24 08:08 aduh95

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1623/

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***    -35.89 %       ±0.47% ±0.63% ±0.84%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***    -32.21 %       ±0.33% ±0.44% ±0.58%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***    -51.54 %       ±0.53% ±0.71% ±0.94%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'                     -0.23 %       ±0.33% ±0.44% ±0.58%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'             ***    -29.42 %       ±0.60% ±0.80% ±1.04%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                -0.20 %       ±0.31% ±0.41% ±0.53%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'        ***    -28.38 %       ±0.55% ±0.74% ±0.98%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                    -0.59 %       ±1.29% ±1.73% ±2.27%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***    -38.36 %       ±0.29% ±0.39% ±0.51%

aduh95 avatar Aug 27 '24 09:08 aduh95

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1624/

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***    -12.29 %       ±0.58% ±0.78% ±1.01%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***    -15.91 %       ±0.46% ±0.61% ±0.79%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***    -15.43 %       ±0.26% ±0.35% ±0.45%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'               *     -0.38 %       ±0.37% ±0.50% ±0.66%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'             ***     -9.85 %       ±0.66% ±0.89% ±1.17%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                 0.06 %       ±0.17% ±0.22% ±0.29%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'        ***    -14.37 %       ±0.69% ±0.93% ±1.21%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                     0.14 %       ±0.49% ±0.65% ±0.85%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***     -9.18 %       ±0.34% ±0.45% ±0.59%

aduh95 avatar Aug 27 '24 09:08 aduh95

As @RedYetiDev originally mentioned in #54515, both | and ~ need to be encoded. However, I noticed that on my local machines (Windows and Linux), they are not encoded when used in a file URL, which has caused some confusion. Additionally, since { and } are encoded by the URL constructor (bindingurl.parse), perhaps we don't need to include them in the regex. Feel free to ignore..

injunchoi98 avatar Aug 27 '24 09:08 injunchoi98

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1625/

                                                                                                                 confidence improvement accuracy (*)   (**)  (***)
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool' method='pathToFileURL'                    ***     -4.30 %       ±0.29% ±0.39% ±0.51%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null?key=param&bool#hash' method='pathToFileURL'               ***     -6.39 %       ±0.37% ±0.49% ±0.64%
url/whatwg-url-to-and-from-path.js n=5000000 input='/dev/null' method='pathToFileURL'                                   ***     -9.74 %       ±0.74% ±0.99% ±1.29%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='fileURLToPath'                      0.06 %       ±0.34% ±0.45% ±0.59%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool' method='pathToFileURL'             ***     -3.71 %       ±0.56% ±0.74% ±0.97%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='fileURLToPath'                -0.07 %       ±0.24% ±0.32% ±0.42%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null?key=param&bool#hash' method='pathToFileURL'        ***     -8.00 %       ±0.55% ±0.73% ±0.95%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='fileURLToPath'                                    -0.15 %       ±0.93% ±1.24% ±1.62%
url/whatwg-url-to-and-from-path.js n=5000000 input='file:///dev/null' method='pathToFileURL'                            ***     -5.73 %       ±0.58% ±0.78% ±1.02%

aduh95 avatar Aug 27 '24 10:08 aduh95

Did my best to minimize the perf regression, I can't think of anything else to try, so unless someone has another idea, I think this is ready.

aduh95 avatar Aug 27 '24 13:08 aduh95

CI: https://ci.nodejs.org/job/node-test-pull-request/61545/

nodejs-github-bot avatar Aug 27 '24 13:08 nodejs-github-bot

Can we have a fast path for internal node.js usage that can avoid these slowness? @aduh95

anonrig avatar Aug 28 '24 01:08 anonrig

Can we have a fast path for internal node.js usage that can avoid these slowness? @aduh95

I don't think pathToFileURL is used in any hot path (I had a quick look at the 53 occurrences of it in the lib/ folder), I'd be surprised if it had any significant impact on the other parts of the codebase. It's rather rare than we need to convert a path to a file URL, but when we do, I think we want to generate a URL with those "unsafe" char correctly encoded.

aduh95 avatar Aug 28 '24 23:08 aduh95

CI: https://ci.nodejs.org/job/node-test-pull-request/61615/

nodejs-github-bot avatar Aug 28 '24 23:08 nodejs-github-bot

One problem here is that import(pathToFileURL('/path/to/[mod.js')) will no longer work for modules previously loaded as import './[mod.js'.

This is because new URL('./[mod.js', absUrl) does not perform this escaping for us, therefore we cannot rely on the invariant we were previously able to rely on.

Note that this is distinct from eg the escaping which is performed for other characters in relative URL normalization such as a space.

So I guess the question is which use case should take precedence and why?

guybedford avatar Aug 28 '24 23:08 guybedford

Specifically is the goal that the file URL performs all escaping to be used on the filesystem?

Or should the goal be that file URL represents the URL normalization to load from the module registry?

Personally I would prefer we retain the latter property as more important in defining what is meant by canonical file URLs.

guybedford avatar Aug 29 '24 00:08 guybedford

@guybedford could you write a test that you think would pass in main and fail in this PR to illustrate your point?

aduh95 avatar Aug 29 '24 04:08 aduh95

@guybedford I understand your point, and I noticed that the import statement works fine whether the module path is encoded or not. This happens because import adheres to the WHATWG URL specifications. If the input is encoded, it gets decoded, which is why both import './[mod.js' and import './%5Bmod.js' work correctly.

According to WHATWG, when the encoded input is encountered, it gets decoded:

To UTF-8 percent-encode a scalar value scalarValue using a percentEncodeSet, return the result of running percent-encode after encoding with UTF-8, scalarValue as a string, and percentEncodeSet.

To UTF-8 percent-encode a scalar value string input using a percentEncodeSet, return the result of running percent-encode after encoding with UTF-8, input, and percentEncodeSet.

Operation Input Output
Percent-encode input 0x23 "%23"
Percent-encode input 0x7F "%7F"
Percent-decode input %25%s%1G %%s%1G
Percent-decode input "‽%25%2E" 0xE2 0x80 0xBD 0x25 0x2E
Percent-encode after encoding with Shift_JIS, input, and the userinfo percent-encode set " " "%20"
Percent-encode after encoding with Shift_JIS, input, and the userinfo percent-encode set "≡" "%81%DF"
Percent-encode after encoding with Shift_JIS, input, the userinfo percent-encode set, and true "1+1 ≡ 2%20‽" "1+1+%81%DF+2%20%26%238253%3B"
UTF-8 percent-encode input using the userinfo percent-encode set U+2261 (≡) "%E2%89%A1"
UTF-8 percent-encode input using the userinfo percent-encode set U+203D (‽) "%E2%80%BD"
UTF-8 percent-encode input using the userinfo percent-encode set "Say what‽" "Say%20what%E2%80%BD"

I think this explains why both encoded and unencoded module paths can be used in JavaScript import statements.

injunchoi98 avatar Sep 01 '24 08:09 injunchoi98

CI: https://ci.nodejs.org/job/node-test-pull-request/62041/

nodejs-github-bot avatar Sep 06 '24 08:09 nodejs-github-bot

CI: https://ci.nodejs.org/job/node-test-pull-request/62044/

nodejs-github-bot avatar Sep 06 '24 09:09 nodejs-github-bot

CI is failing but it's unclear if they are all flaky failures or if some subset of the failures are caused by the changes here.

jasnell avatar Sep 08 '24 03:09 jasnell