tools icon indicating copy to clipboard operation
tools copied to clipboard

feat(rome_js_formatter): Parenthesizing expressions

Open MichaReiser opened this issue 3 years ago • 14 comments
trafficstars

Summary

Part of #3046

This PR adds the capability of adding parentheses to improve readability or removing parentheses if they aren't strictly necessary.

The core architecture is around the NeedsParentheses trait that every JsAnyExpression now implements. Its main method is needs_parentheses which must return true if parentheses are necessary if removing the parentheses would change the semantics OR result in a syntax error. needs_parentheses may also return true in cases where parentheses aren't strictly necessary but improve readability.

The new NeedsParentheses trait is used in the FormatNodeRule trait that gets called when formatting every node. It now adds parentheses if needs_parentheses returns true.

FormatJsParenthesizedExpression has been changed to always remove parentheses and rely on FormatNodeRule to re-insert parentheses if necessary. This has the downside that any trailing comments of ( and leading comments of ) are pushed outside the parentheses. I think that's OK and is something we can address when working on #2768.

Note: This PR implements needs_parentheses for all binary like expression but they aren't used yet. I struggled to integrate the logic into the existing binary like formatting and intend to tackle it with anyway needed rewrite to closer match prettier's formatting. I plan to do this next.

Performance

This PR significantly reduces Rome performance because it is now necessary to call expression.resolve() or expression.resolve_parent to skip over parenthesized expressions when testing if a child or parent is of a certain kind.

The only alternative that I see to this approach would be to remove all ParenthesizedExpressions before formatting but this will have its own performance cost, requires source maps to map back to the original code in case a node must be printed in verbatim (to recover the parentheses), and is something we still have the chance to do in the future.

My main concern is more about the added complexity and it's easy to forget a resolve_expression or resolve_expression_parent call. I don't have good answer to this and there are likely still a few places where I missed to add these calls myself. My recommendation is that we can fix these over time.

group                                    main                                    parentheses
-----                                    ----                                    -----------
formatter/checker.ts                     1.00   219.6±15.94ms    11.8 MB/sec     1.04   227.3±18.85ms    11.4 MB/sec
formatter/compiler.js                    1.00    124.8±1.59ms     8.4 MB/sec     1.12    140.0±9.95ms     7.5 MB/sec
formatter/d3.min.js                      1.00    103.2±7.78ms     2.5 MB/sec     1.16    119.6±8.73ms     2.2 MB/sec
formatter/dojo.js                        1.00      8.1±0.02ms     8.5 MB/sec     1.00      8.1±0.62ms     8.5 MB/sec
formatter/ios.d.ts                       1.00    144.8±1.88ms    12.9 MB/sec     1.09   158.0±15.89ms    11.8 MB/sec
formatter/jquery.min.js                  1.00     29.7±1.87ms     2.8 MB/sec     1.03     30.7±2.13ms     2.7 MB/sec
formatter/math.js                        1.01   235.5±14.90ms     2.7 MB/sec     1.00   232.7±18.55ms     2.8 MB/sec
formatter/parser.ts                      1.02      5.2±0.40ms     9.3 MB/sec     1.00      5.1±0.32ms     9.5 MB/sec
formatter/pixi.min.js                    1.00    127.5±6.82ms     3.4 MB/sec     1.05   134.3±10.27ms     3.3 MB/sec
formatter/react-dom.production.min.js    1.04     37.3±2.29ms     3.1 MB/sec     1.00     35.7±1.18ms     3.2 MB/sec
formatter/react.production.min.js        1.00  1861.6±130.45µs     3.3 MB/sec    1.12      2.1±0.13ms     3.0 MB/sec
formatter/router.ts                      1.00      4.1±0.31ms    15.1 MB/sec     1.13      4.6±0.19ms    13.3 MB/sec
formatter/tex-chtml-full.js              1.00    274.8±6.53ms     3.3 MB/sec     1.14   314.6±22.55ms     2.9 MB/sec
formatter/three.min.js                   1.00   135.7±10.68ms     4.3 MB/sec     1.07    145.5±9.52ms     4.0 MB/sec
formatter/typescript.js                  1.00   911.6±60.87ms    10.4 MB/sec     1.03   938.6±58.19ms    10.1 MB/sec
formatter/vue.global.prod.js             1.00     42.9±0.15ms     2.8 MB/sec     1.25     53.4±3.34ms     2.3 MB/sec

Prettier Difference

I decided to diverge from Prettier's formatting in two cases:

ObjectExpression, FunctionExpression, or ClassExpression at the beginning of a statement.

// input
{}.b

// Prettier
({}.b)

// Rome
({}).b 

Prettier parenthesizes the whole expression whereas Rome parenthesizes the object/function/class expression only.

The main reason for diverging is that parenthesizing the whole expression requires that the logic is implemented in any expression that starts with another child expression (tagged template, binary expressions, computed member/assignment, sequence, conditional, ... ) and it requires traversing the first expression until it reaches an object expression or any expression that doesn't start with another expression. This is rather expensive and Rome's approach avoids the expensive traversal except for object/function and class expressions, of which there should be fewer.

Parenthesized optional chain

// Input
(a?.b).c
a?.b.c

// Prettier
(a?.b).c
a?.b.c

// Rome
a?.b.c
a?.b.c

Prettier keeps the parentheses if they were present in the source document but never adds them if they were missing. This is the only case where I found that the fact that parens are present in the source document changed how Prettier parenthesizes formatted expression.

My take is that Rome should apply a consistent formatting regardless if parentheses were or were not present in the source document.

Test Plan

I manually reviewed the snapshot changes.

Average compatibility: 81.82 -> 81.54% Compatible lines: 78.64 -> 78.34%

This PR is regressing the compatibility. Mainly due to the following reasons:

  • Prettier has custom handling to avoid removing parentheses around closure type cast comments which Rome has not. This is something we can add support for in the future. The way I envision this to work is that we add a tag to every JsParenthesizedExpression that has a type cast comment (CC: @leops for a use case for node annotations)
  • Our handling of comments differs from Prettier's which results in a large number of line changes. We'll tackle this as part of #2768
  • Prettier incompatibility of other syntax like JsCallArguments, JsTemplate, JsBinaryLikeExpression, member like expressions and more.

I'm in favour of merging this PR regardless of the regressions because these issues should be tackled when working on comments, or when improving the formatting of other syntaxes.

MichaReiser avatar Aug 12 '22 16:08 MichaReiser

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9823c4b
Status: ✅  Deploy successful!
Preview URL: https://af726b97.tools-8rn.pages.dev
Branch Preview URL: https://feat-needs-parentheses.tools-8rn.pages.dev

View logs

Parser conformance results on ubuntu-latest

js/262

Test result main count This PR count Difference
Total 45878 45878 0
Passed 44938 44938 0
Failed 940 940 0
Panics 0 0 0
Coverage 97.95% 97.95% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 39 39 0
Passed 36 36 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.31% 92.31% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 5946 5946 0
Passed 396 396 0
Failed 5550 5550 0
Panics 0 0 0
Coverage 6.66% 6.66% 0.00%

ts/babel

Test result main count This PR count Difference
Total 588 588 0
Passed 519 519 0
Failed 69 69 0
Panics 0 0 0
Coverage 88.27% 88.27% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 16257 16257 0
Passed 12397 12397 0
Failed 3860 3860 0
Panics 0 0 0
Coverage 76.26% 76.26% 0.00%

github-actions[bot] avatar Aug 12 '22 16:08 github-actions[bot]

!bench_formatter

MichaReiser avatar Aug 13 '22 16:08 MichaReiser

Formatter Benchmark Results

group                                    main                                   pr
-----                                    ----                                   --
formatter/checker.ts                     1.00    362.9±1.62ms     7.2 MB/sec    1.09    396.9±3.94ms     6.6 MB/sec
formatter/compiler.js                    1.00    225.4±1.89ms     4.6 MB/sec    1.11    250.2±1.69ms     4.2 MB/sec
formatter/d3.min.js                      1.00    176.7±1.40ms  1519.4 KB/sec    1.14    200.6±2.17ms  1337.8 KB/sec
formatter/dojo.js                        1.00     12.9±0.06ms     5.3 MB/sec    1.08     13.8±0.02ms     5.0 MB/sec
formatter/ios.d.ts                       1.00    258.5±2.59ms     7.2 MB/sec    1.06    273.6±1.29ms     6.8 MB/sec
formatter/jquery.min.js                  1.00     49.4±0.20ms  1711.6 KB/sec    1.11     54.7±0.11ms  1547.9 KB/sec
formatter/math.js                        1.00    387.4±1.70ms  1711.7 KB/sec    1.08    418.8±2.41ms  1583.4 KB/sec
formatter/parser.ts                      1.00      8.7±0.10ms     5.5 MB/sec    1.07      9.3±0.01ms     5.2 MB/sec
formatter/pixi.min.js                    1.00    202.3±2.37ms     2.2 MB/sec    1.16    234.8±1.84ms  1914.2 KB/sec
formatter/react-dom.production.min.js    1.00     60.1±0.48ms  1962.4 KB/sec    1.11     66.8±0.21ms  1763.1 KB/sec
formatter/react.production.min.js        1.00      3.2±0.01ms  1990.4 KB/sec    1.09      3.4±0.01ms  1828.4 KB/sec
formatter/router.ts                      1.00      6.8±0.00ms     9.1 MB/sec    1.08      7.3±0.02ms     8.4 MB/sec
formatter/tex-chtml-full.js              1.00    496.3±1.85ms  1880.3 KB/sec    1.09    541.6±3.21ms  1723.0 KB/sec
formatter/three.min.js                   1.00    234.0±1.53ms     2.5 MB/sec    1.13    263.7±1.69ms     2.2 MB/sec
formatter/typescript.js                  1.00  1512.6±14.24ms     6.3 MB/sec    1.08  1627.0±10.80ms     5.8 MB/sec
formatter/vue.global.prod.js             1.00     80.6±1.02ms  1531.0 KB/sec    1.10     89.0±0.96ms  1385.6 KB/sec

github-actions[bot] avatar Aug 13 '22 16:08 github-actions[bot]

Is there a specific reason why the "needs parentheses" information is tracked inside the formatter phase?

Have you considered some other solutions, such as a visitor approach, or even computing this information during the parsing phase and storing it inside our green tree?

The main reasons for computing whether an expression needs to be parenthesized during the formatting phase are:

  • The formatter adds more parentheses than necessary to not change precedence. For example, Rome adds parentheses around shift operations to ease readability.
  • The information which nodes need parentheses would become outdated during tree mutations or, it would be necessary to update this information when mutating the tree. Computing this information isn't cheap because it involves many tree navigation (multiple allocations).
  • The knowledge of which expressions need parentheses aren't needed for all use cases of the parser and computing the information during the parse phase would add an unnecessary cost (performance penalty). Besides, an over-approximation
  • of which nodes require parentheses is stored in the green tree by having ParenthesizedExpression. Agreed, the tree may have more parenthesized expressions than necessary, but all the required once are in place.
  • Our CST doesn't support storing additional data, e.g. by using annotations on the tree.

I would like to know what are the implications of your decisions in the future developments

The main implication is that it increases complexity as well as the risk for subtle bugs when we forget to call resolve_expression or resolve_expression_parent.

There are a few alternatives to this design, some of which I want to explore after I rewrote binary expression formatting:

  • Pre-compute needs_parentheses: An option is to pre-compute needs_parentheses when extracting the suppression comments and storing the information on the FormatContext. This would remove the overhead for resolving the parent for every node. I'm doubtful that this will yield a big performance and it increases complexity a bit, because knowing if something needs parentheses then needs a FormatContext. We can try this out as a dedicated PR if we're interested.
  • Remove parenthesized expressions: The idea here is to pre-process the tree to remove all parenthesized expressions (except the ones that have closure type cast comments, or have any skipped token trivia attached to the left or right parens). This will have its own performance overhead but removes the need for calling resolve_expression and resolve_expression_parent which I see as the main drawback of the current solution. This is something I want to explore after I finished my work on the binary expression formatting. The main challenge is that I'll need to figure out a way to map positions from the transformed tree back to the original tree (source maps?) to be able to print a node in verbatim node if it's formatting (or any child) fails because of a missing child node.

MichaReiser avatar Aug 15 '22 06:08 MichaReiser

!bench_formatter

MichaReiser avatar Aug 15 '22 08:08 MichaReiser

Formatter Benchmark Results

group                                    main                                   pr
-----                                    ----                                   --
formatter/checker.ts                     1.00    369.8±7.82ms     7.0 MB/sec    1.08    397.5±2.78ms     6.5 MB/sec
formatter/compiler.js                    1.00    223.0±1.42ms     4.7 MB/sec    1.13    251.0±1.16ms     4.2 MB/sec
formatter/d3.min.js                      1.00    177.0±1.28ms  1516.5 KB/sec    1.15    203.1±0.81ms  1321.4 KB/sec
formatter/dojo.js                        1.00     12.7±0.02ms     5.4 MB/sec    1.09     13.8±0.04ms     5.0 MB/sec
formatter/ios.d.ts                       1.00    256.6±2.79ms     7.3 MB/sec    1.07    274.6±0.89ms     6.8 MB/sec
formatter/jquery.min.js                  1.00     48.4±0.37ms  1748.5 KB/sec    1.13     54.5±0.22ms  1552.2 KB/sec
formatter/math.js                        1.00    378.9±2.23ms  1749.8 KB/sec    1.11    420.9±4.43ms  1575.5 KB/sec
formatter/parser.ts                      1.00      8.6±0.07ms     5.6 MB/sec    1.09      9.3±0.03ms     5.2 MB/sec
formatter/pixi.min.js                    1.00    201.0±2.54ms     2.2 MB/sec    1.19    238.7±0.88ms  1883.0 KB/sec
formatter/react-dom.production.min.js    1.00     59.1±0.31ms  1994.0 KB/sec    1.13     66.6±0.43ms  1769.4 KB/sec
formatter/react.production.min.js        1.00      3.1±0.01ms  2011.8 KB/sec    1.10      3.4±0.01ms  1835.7 KB/sec
formatter/router.ts                      1.00      6.7±0.01ms     9.2 MB/sec    1.09      7.3±0.02ms     8.4 MB/sec
formatter/tex-chtml-full.js              1.00    487.5±2.13ms  1914.1 KB/sec    1.12    545.3±2.48ms  1711.1 KB/sec
formatter/three.min.js                   1.00    230.8±1.93ms     2.5 MB/sec    1.16    268.1±1.28ms     2.2 MB/sec
formatter/typescript.js                  1.00  1494.3±26.29ms     6.4 MB/sec    1.10   1637.6±8.07ms     5.8 MB/sec
formatter/vue.global.prod.js             1.00     78.3±0.42ms  1576.3 KB/sec    1.12     87.7±0.61ms  1406.0 KB/sec

github-actions[bot] avatar Aug 15 '22 09:08 github-actions[bot]

!bench_formatter

MichaReiser avatar Aug 15 '22 09:08 MichaReiser

Formatter Benchmark Results

group                                    main                                   pr
-----                                    ----                                   --
formatter/checker.ts                     1.07   409.4±14.57ms     6.4 MB/sec    1.00   383.7±13.97ms     6.8 MB/sec
formatter/compiler.js                    1.00    216.3±4.69ms     4.8 MB/sec    1.05    226.1±5.57ms     4.6 MB/sec
formatter/d3.min.js                      1.00    174.1±8.39ms  1541.4 KB/sec    1.07    186.4±6.59ms  1440.2 KB/sec
formatter/dojo.js                        1.00     11.2±0.31ms     6.2 MB/sec    1.09     12.2±0.33ms     5.6 MB/sec
formatter/ios.d.ts                       1.00   243.3±12.81ms     7.7 MB/sec    1.04    252.1±4.82ms     7.4 MB/sec
formatter/jquery.min.js                  1.00     44.9±2.90ms  1884.2 KB/sec    1.09     48.9±2.86ms  1729.9 KB/sec
formatter/math.js                        1.00   384.8±23.87ms  1723.1 KB/sec    1.03   395.1±16.40ms  1678.2 KB/sec
formatter/parser.ts                      1.03      8.1±0.49ms     6.0 MB/sec    1.00      7.8±0.12ms     6.2 MB/sec
formatter/pixi.min.js                    1.00    198.6±7.66ms     2.2 MB/sec    1.05    208.9±5.22ms     2.1 MB/sec
formatter/react-dom.production.min.js    1.00     52.7±2.36ms     2.2 MB/sec    1.12     59.2±2.76ms  1989.7 KB/sec
formatter/react.production.min.js        1.00      2.7±0.02ms     2.3 MB/sec    1.08      2.9±0.01ms     2.1 MB/sec
formatter/router.ts                      1.00      5.8±0.04ms    10.6 MB/sec    1.07      6.2±0.03ms    10.0 MB/sec
formatter/tex-chtml-full.js              1.00   483.8±14.34ms  1928.6 KB/sec    1.00    483.3±7.78ms  1930.9 KB/sec
formatter/three.min.js                   1.00   227.9±10.33ms     2.6 MB/sec    1.07    244.7±6.65ms     2.4 MB/sec
formatter/typescript.js                  1.00  1517.8±51.83ms     6.3 MB/sec    1.04  1583.0±40.92ms     6.0 MB/sec
formatter/vue.global.prod.js             1.00     75.4±4.09ms  1635.8 KB/sec    1.06     80.2±3.65ms  1538.9 KB/sec

github-actions[bot] avatar Aug 15 '22 09:08 github-actions[bot]

The formatter adds more parentheses than necessary to not change precedence. For example, Rome adds parentheses around shift operations to ease readability.

How does this influence the decision while considering other approaches?

The information which nodes need parentheses would become outdated during tree mutations or, it would be necessary to update this information when mutating the tree. Computing this information isn't cheap because it involves many tree navigation (multiple allocations).

Could you give me more information here? In which context an information becomes outdated? Are you referring to a potential visitor approach? Or else?

The knowledge of which expressions need parentheses aren't needed for all use cases of the parser and computing the information during the parse phase would add an unnecessary cost (performance penalty).

What if one day this information will be needed for some other tool, for example a minifier. How hard/complex/easy would be to refactor this piece of logic?

Besides, an over-approximation of which nodes require parentheses is stored in the green tree by having ParenthesizedExpression. Agreed, the tree may have more parenthesized expressions than necessary, but all the required once are in place.

I didn't understand this bullet point.

There are a few alternatives to this design, some of which I want to explore after I rewrote binary expression formatting:

Pre-compute needs_parentheses: An option is to pre-compute needs_parentheses when extracting the suppression comments and storing the information on the FormatContext. This would remove the overhead for resolving the parent for every node. I'm doubtful that this will yield a big performance and it increases complexity a bit, because knowing if something needs parentheses then needs a FormatContext. We can try this out as a dedicated PR if we're interested.

Remove parenthesized expressions: The idea here is to pre-process the tree to remove all parenthesized expressions (except the ones that have closure type cast comments, or have any skipped token trivia attached to the left or right parens). This will have its own performance overhead but removes the need for calling resolve_expression and resolve_expression_parent which I see as the main drawback of the current solution. This is something I want to explore after I finished my work on the binary expression formatting. The main challenge is that I'll need to figure out a way to map positions from the transformed tree back to the original tree (source maps?) to be able to print a node in verbatim node if it's formatting (or any child) fails because of a missing child node.

Could you explain a bit, as an overview, how these bullet points are related to the solution of this PR? Does that mean that the solution of this PR is not the final one? Or are they complementary? If these bullet points are real alternatives, how do you plan to actually understand the pros and cons and eventually propose a final algorithm?

Just asking because now, I don't know what I have to review in this PR.

ematipico avatar Aug 15 '22 11:08 ematipico

The formatter adds more parentheses than necessary to not change precedence. For example, Rome adds parentheses around shift operations to ease readability.

How does this influence the decision while considering other approaches?

The main difference this makes is that the problem could otherwise be simplified to only remove unnecessary parentheses because it should be safe for the formatter to assume that the input tree is correctly parenthesized. I didn't pursue this option because it doesn't achieve our goal to be prettier compatible.

The information which nodes need parentheses would become outdated during tree mutations or, it would be necessary to update this information when mutating the tree. Computing this information isn't cheap because it involves many tree navigation (multiple allocations).

Could you give me more information here? In which context an information becomes outdated? Are you referring to a potential visitor approach? Or else?

Let's say you start with

let a = await call();
a[member];

await call() doesn't need any parentheses

Now let's do an inline refactor of a

(await call())[member]

It now becomes necessary to parenthesis await call() to form a valid syntax tree, meaning, it would be necessary to update the information if await call() needs parentheses or not.

The knowledge of which expressions need parentheses aren't needed for all use cases of the parser and computing the information during the parse phase would add an unnecessary cost (performance penalty).

What if one day this information will be needed for some other tool, for example a minifier. How hard/complex/easy would be to refactor this piece of logic?

You could split the NeedParentheses trait to have a method that only returns true if parentheses are necessary to avoid syntax errors and another that may also return true for "nicety" parentheses.

Besides, an over-approximation of which nodes require parentheses is stored in the green tree by having ParenthesizedExpression. Agreed, the tree may have more parenthesized expressions than necessary, but all the required once are in place.

I didn't understand this bullet point.

What I'm saying is that ParenthesizedExpression expressions inside of the tree encode where parentheses may be necessary. It may be possible to remove some, but if we keep the same amount of parenthesized expressions, then it's guaranteed that the tree remains valid IF the input tree was valid.

If this doesn't help, can you explain what you didn't understand

There are a few alternatives to this design, some of which I want to explore after I rewrote binary expression formatting:

Pre-compute needs_parentheses: An option is to pre-compute needs_parentheses when extracting the suppression comments and storing the information on the FormatContext. This would remove the overhead for resolving the parent for every node. I'm doubtful that this will yield a big performance and it increases complexity a bit, because knowing if something needs parentheses then needs a FormatContext. We can try this out as a dedicated PR if we're interested. Remove parenthesized expressions: The idea here is to pre-process the tree to remove all parenthesized expressions (except the ones that have closure type cast comments, or have any skipped token trivia attached to the left or right parens). This will have its own performance overhead but removes the need for calling resolve_expression and resolve_expression_parent which I see as the main drawback of the current solution. This is something I want to explore after I finished my work on the binary expression formatting. The main challenge is that I'll need to figure out a way to map positions from the transformed tree back to the original tree (source maps?) to be able to print a node in verbatim node if it's formatting (or any child) fails because of a missing child node.

Could you explain a bit, as an overview, how these bullet points are related to the solution of this PR? Does that mean that the solution of this PR is not the final one? Or are they complementary? If these bullet points are real alternatives, how do you plan to actually understand the pros and cons and eventually propose a final algorithm?

These alternatives would build up on top of this PR. This PR was the most straightforward approach to get something working.

  • Pre-compute: uses the same idea but instead of calling node.needs_parentheses inside of FormatNodeRule it would call context.needs_parentheses(node) which would perform a Set lookup. The Set of nodes that needs parentheses would be computed ahead of time when extracting the suppressions. The only thing that changes is when node.needs_parentheses is called.
  • Remove parenthesized expressions: This will revert the changes to ExpressionNode.resolve but the core logic around NeedsParentheses with all the implementations will remain the same. Therefore, this is another incremental change.

MichaReiser avatar Aug 15 '22 11:08 MichaReiser

If the diverging is intentional and we won't go back to that, could you please add a task/reminder for ourselves to update the website with this decision? We have to update the section where we track the difference section

ematipico avatar Aug 17 '22 10:08 ematipico

I plan to merge this PR by tomorrow except there are some major conceptual concerns around it.

MichaReiser avatar Aug 17 '22 11:08 MichaReiser