tools
tools copied to clipboard
feat(rome_js_formatter): Parenthesizing expressions
Summary
Part of #3046
This PR adds the capability of adding parentheses to improve readability or removing parentheses if they aren't strictly necessary.
The core architecture is around the NeedsParentheses trait that every JsAnyExpression now implements. Its main method is needs_parentheses which must return true if parentheses are necessary if removing the parentheses would change the semantics OR result in a syntax error. needs_parentheses may also return true in cases where parentheses aren't strictly necessary but improve readability.
The new NeedsParentheses trait is used in the FormatNodeRule trait that gets called when formatting every node. It now adds parentheses if needs_parentheses returns true.
FormatJsParenthesizedExpression has been changed to always remove parentheses and rely on FormatNodeRule to re-insert parentheses if necessary. This has the downside that any trailing comments of ( and leading comments of ) are pushed outside the parentheses. I think that's OK and is something we can address when working on #2768.
Note: This PR implements needs_parentheses for all binary like expression but they aren't used yet. I struggled to integrate the logic into the existing binary like formatting and intend to tackle it with anyway needed rewrite to closer match prettier's formatting. I plan to do this next.
Performance
This PR significantly reduces Rome performance because it is now necessary to call expression.resolve() or expression.resolve_parent to skip over parenthesized expressions when testing if a child or parent is of a certain kind.
The only alternative that I see to this approach would be to remove all ParenthesizedExpressions before formatting but this will have its own performance cost, requires source maps to map back to the original code in case a node must be printed in verbatim (to recover the parentheses), and is something we still have the chance to do in the future.
My main concern is more about the added complexity and it's easy to forget a resolve_expression or resolve_expression_parent call. I don't have good answer to this and there are likely still a few places where I missed to add these calls myself. My recommendation is that we can fix these over time.
group main parentheses
----- ---- -----------
formatter/checker.ts 1.00 219.6±15.94ms 11.8 MB/sec 1.04 227.3±18.85ms 11.4 MB/sec
formatter/compiler.js 1.00 124.8±1.59ms 8.4 MB/sec 1.12 140.0±9.95ms 7.5 MB/sec
formatter/d3.min.js 1.00 103.2±7.78ms 2.5 MB/sec 1.16 119.6±8.73ms 2.2 MB/sec
formatter/dojo.js 1.00 8.1±0.02ms 8.5 MB/sec 1.00 8.1±0.62ms 8.5 MB/sec
formatter/ios.d.ts 1.00 144.8±1.88ms 12.9 MB/sec 1.09 158.0±15.89ms 11.8 MB/sec
formatter/jquery.min.js 1.00 29.7±1.87ms 2.8 MB/sec 1.03 30.7±2.13ms 2.7 MB/sec
formatter/math.js 1.01 235.5±14.90ms 2.7 MB/sec 1.00 232.7±18.55ms 2.8 MB/sec
formatter/parser.ts 1.02 5.2±0.40ms 9.3 MB/sec 1.00 5.1±0.32ms 9.5 MB/sec
formatter/pixi.min.js 1.00 127.5±6.82ms 3.4 MB/sec 1.05 134.3±10.27ms 3.3 MB/sec
formatter/react-dom.production.min.js 1.04 37.3±2.29ms 3.1 MB/sec 1.00 35.7±1.18ms 3.2 MB/sec
formatter/react.production.min.js 1.00 1861.6±130.45µs 3.3 MB/sec 1.12 2.1±0.13ms 3.0 MB/sec
formatter/router.ts 1.00 4.1±0.31ms 15.1 MB/sec 1.13 4.6±0.19ms 13.3 MB/sec
formatter/tex-chtml-full.js 1.00 274.8±6.53ms 3.3 MB/sec 1.14 314.6±22.55ms 2.9 MB/sec
formatter/three.min.js 1.00 135.7±10.68ms 4.3 MB/sec 1.07 145.5±9.52ms 4.0 MB/sec
formatter/typescript.js 1.00 911.6±60.87ms 10.4 MB/sec 1.03 938.6±58.19ms 10.1 MB/sec
formatter/vue.global.prod.js 1.00 42.9±0.15ms 2.8 MB/sec 1.25 53.4±3.34ms 2.3 MB/sec
Prettier Difference
I decided to diverge from Prettier's formatting in two cases:
ObjectExpression, FunctionExpression, or ClassExpression at the beginning of a statement.
// input
{}.b
// Prettier
({}.b)
// Rome
({}).b
Prettier parenthesizes the whole expression whereas Rome parenthesizes the object/function/class expression only.
The main reason for diverging is that parenthesizing the whole expression requires that the logic is implemented in any expression that starts with another child expression (tagged template, binary expressions, computed member/assignment, sequence, conditional, ... ) and it requires traversing the first expression until it reaches an object expression or any expression that doesn't start with another expression. This is rather expensive and Rome's approach avoids the expensive traversal except for object/function and class expressions, of which there should be fewer.
Parenthesized optional chain
// Input
(a?.b).c
a?.b.c
// Prettier
(a?.b).c
a?.b.c
// Rome
a?.b.c
a?.b.c
Prettier keeps the parentheses if they were present in the source document but never adds them if they were missing. This is the only case where I found that the fact that parens are present in the source document changed how Prettier parenthesizes formatted expression.
My take is that Rome should apply a consistent formatting regardless if parentheses were or were not present in the source document.
Test Plan
I manually reviewed the snapshot changes.
Average compatibility: 81.82 -> 81.54% Compatible lines: 78.64 -> 78.34%
This PR is regressing the compatibility. Mainly due to the following reasons:
- Prettier has custom handling to avoid removing parentheses around closure type cast comments which Rome has not. This is something we can add support for in the future. The way I envision this to work is that we add a
tagto everyJsParenthesizedExpressionthat has a type cast comment (CC: @leops for a use case for node annotations) - Our handling of comments differs from Prettier's which results in a large number of line changes. We'll tackle this as part of #2768
- Prettier incompatibility of other syntax like
JsCallArguments,JsTemplate,JsBinaryLikeExpression, member like expressions and more.
I'm in favour of merging this PR regardless of the regressions because these issues should be tackled when working on comments, or when improving the formatting of other syntaxes.
Deploying with
Cloudflare Pages
| Latest commit: |
9823c4b
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://af726b97.tools-8rn.pages.dev |
| Branch Preview URL: | https://feat-needs-parentheses.tools-8rn.pages.dev |
Parser conformance results on ubuntu-latest
js/262
| Test result | main count |
This PR count | Difference |
|---|---|---|---|
| Total | 45878 | 45878 | 0 |
| Passed | 44938 | 44938 | 0 |
| Failed | 940 | 940 | 0 |
| Panics | 0 | 0 | 0 |
| Coverage | 97.95% | 97.95% | 0.00% |
jsx/babel
| Test result | main count |
This PR count | Difference |
|---|---|---|---|
| Total | 39 | 39 | 0 |
| Passed | 36 | 36 | 0 |
| Failed | 3 | 3 | 0 |
| Panics | 0 | 0 | 0 |
| Coverage | 92.31% | 92.31% | 0.00% |
symbols/microsoft
| Test result | main count |
This PR count | Difference |
|---|---|---|---|
| Total | 5946 | 5946 | 0 |
| Passed | 396 | 396 | 0 |
| Failed | 5550 | 5550 | 0 |
| Panics | 0 | 0 | 0 |
| Coverage | 6.66% | 6.66% | 0.00% |
ts/babel
| Test result | main count |
This PR count | Difference |
|---|---|---|---|
| Total | 588 | 588 | 0 |
| Passed | 519 | 519 | 0 |
| Failed | 69 | 69 | 0 |
| Panics | 0 | 0 | 0 |
| Coverage | 88.27% | 88.27% | 0.00% |
ts/microsoft
| Test result | main count |
This PR count | Difference |
|---|---|---|---|
| Total | 16257 | 16257 | 0 |
| Passed | 12397 | 12397 | 0 |
| Failed | 3860 | 3860 | 0 |
| Panics | 0 | 0 | 0 |
| Coverage | 76.26% | 76.26% | 0.00% |
!bench_formatter
Formatter Benchmark Results
group main pr
----- ---- --
formatter/checker.ts 1.00 362.9±1.62ms 7.2 MB/sec 1.09 396.9±3.94ms 6.6 MB/sec
formatter/compiler.js 1.00 225.4±1.89ms 4.6 MB/sec 1.11 250.2±1.69ms 4.2 MB/sec
formatter/d3.min.js 1.00 176.7±1.40ms 1519.4 KB/sec 1.14 200.6±2.17ms 1337.8 KB/sec
formatter/dojo.js 1.00 12.9±0.06ms 5.3 MB/sec 1.08 13.8±0.02ms 5.0 MB/sec
formatter/ios.d.ts 1.00 258.5±2.59ms 7.2 MB/sec 1.06 273.6±1.29ms 6.8 MB/sec
formatter/jquery.min.js 1.00 49.4±0.20ms 1711.6 KB/sec 1.11 54.7±0.11ms 1547.9 KB/sec
formatter/math.js 1.00 387.4±1.70ms 1711.7 KB/sec 1.08 418.8±2.41ms 1583.4 KB/sec
formatter/parser.ts 1.00 8.7±0.10ms 5.5 MB/sec 1.07 9.3±0.01ms 5.2 MB/sec
formatter/pixi.min.js 1.00 202.3±2.37ms 2.2 MB/sec 1.16 234.8±1.84ms 1914.2 KB/sec
formatter/react-dom.production.min.js 1.00 60.1±0.48ms 1962.4 KB/sec 1.11 66.8±0.21ms 1763.1 KB/sec
formatter/react.production.min.js 1.00 3.2±0.01ms 1990.4 KB/sec 1.09 3.4±0.01ms 1828.4 KB/sec
formatter/router.ts 1.00 6.8±0.00ms 9.1 MB/sec 1.08 7.3±0.02ms 8.4 MB/sec
formatter/tex-chtml-full.js 1.00 496.3±1.85ms 1880.3 KB/sec 1.09 541.6±3.21ms 1723.0 KB/sec
formatter/three.min.js 1.00 234.0±1.53ms 2.5 MB/sec 1.13 263.7±1.69ms 2.2 MB/sec
formatter/typescript.js 1.00 1512.6±14.24ms 6.3 MB/sec 1.08 1627.0±10.80ms 5.8 MB/sec
formatter/vue.global.prod.js 1.00 80.6±1.02ms 1531.0 KB/sec 1.10 89.0±0.96ms 1385.6 KB/sec
Is there a specific reason why the "needs parentheses" information is tracked inside the formatter phase?
Have you considered some other solutions, such as a visitor approach, or even computing this information during the parsing phase and storing it inside our green tree?
The main reasons for computing whether an expression needs to be parenthesized during the formatting phase are:
- The formatter adds more parentheses than necessary to not change precedence. For example, Rome adds parentheses around shift operations to ease readability.
- The information which nodes need parentheses would become outdated during tree mutations or, it would be necessary to update this information when mutating the tree. Computing this information isn't cheap because it involves many tree navigation (multiple allocations).
- The knowledge of which expressions need parentheses aren't needed for all use cases of the parser and computing the information during the parse phase would add an unnecessary cost (performance penalty). Besides, an over-approximation
- of which nodes require parentheses is stored in the green tree by having
ParenthesizedExpression. Agreed, the tree may have more parenthesized expressions than necessary, but all the required once are in place. - Our CST doesn't support storing additional data, e.g. by using annotations on the tree.
I would like to know what are the implications of your decisions in the future developments
The main implication is that it increases complexity as well as the risk for subtle bugs when we forget to call resolve_expression or resolve_expression_parent.
There are a few alternatives to this design, some of which I want to explore after I rewrote binary expression formatting:
- Pre-compute
needs_parentheses: An option is to pre-computeneeds_parentheseswhen extracting the suppression comments and storing the information on theFormatContext. This would remove the overhead for resolving the parent for every node. I'm doubtful that this will yield a big performance and it increases complexity a bit, because knowing if something needs parentheses then needs aFormatContext. We can try this out as a dedicated PR if we're interested. - Remove parenthesized expressions: The idea here is to pre-process the tree to remove all parenthesized expressions (except the ones that have closure type cast comments, or have any skipped token trivia attached to the left or right parens). This will have its own performance overhead but removes the need for calling
resolve_expressionandresolve_expression_parentwhich I see as the main drawback of the current solution. This is something I want to explore after I finished my work on the binary expression formatting. The main challenge is that I'll need to figure out a way to map positions from the transformed tree back to the original tree (source maps?) to be able to print a node in verbatim node if it's formatting (or any child) fails because of a missing child node.
!bench_formatter
Formatter Benchmark Results
group main pr
----- ---- --
formatter/checker.ts 1.00 369.8±7.82ms 7.0 MB/sec 1.08 397.5±2.78ms 6.5 MB/sec
formatter/compiler.js 1.00 223.0±1.42ms 4.7 MB/sec 1.13 251.0±1.16ms 4.2 MB/sec
formatter/d3.min.js 1.00 177.0±1.28ms 1516.5 KB/sec 1.15 203.1±0.81ms 1321.4 KB/sec
formatter/dojo.js 1.00 12.7±0.02ms 5.4 MB/sec 1.09 13.8±0.04ms 5.0 MB/sec
formatter/ios.d.ts 1.00 256.6±2.79ms 7.3 MB/sec 1.07 274.6±0.89ms 6.8 MB/sec
formatter/jquery.min.js 1.00 48.4±0.37ms 1748.5 KB/sec 1.13 54.5±0.22ms 1552.2 KB/sec
formatter/math.js 1.00 378.9±2.23ms 1749.8 KB/sec 1.11 420.9±4.43ms 1575.5 KB/sec
formatter/parser.ts 1.00 8.6±0.07ms 5.6 MB/sec 1.09 9.3±0.03ms 5.2 MB/sec
formatter/pixi.min.js 1.00 201.0±2.54ms 2.2 MB/sec 1.19 238.7±0.88ms 1883.0 KB/sec
formatter/react-dom.production.min.js 1.00 59.1±0.31ms 1994.0 KB/sec 1.13 66.6±0.43ms 1769.4 KB/sec
formatter/react.production.min.js 1.00 3.1±0.01ms 2011.8 KB/sec 1.10 3.4±0.01ms 1835.7 KB/sec
formatter/router.ts 1.00 6.7±0.01ms 9.2 MB/sec 1.09 7.3±0.02ms 8.4 MB/sec
formatter/tex-chtml-full.js 1.00 487.5±2.13ms 1914.1 KB/sec 1.12 545.3±2.48ms 1711.1 KB/sec
formatter/three.min.js 1.00 230.8±1.93ms 2.5 MB/sec 1.16 268.1±1.28ms 2.2 MB/sec
formatter/typescript.js 1.00 1494.3±26.29ms 6.4 MB/sec 1.10 1637.6±8.07ms 5.8 MB/sec
formatter/vue.global.prod.js 1.00 78.3±0.42ms 1576.3 KB/sec 1.12 87.7±0.61ms 1406.0 KB/sec
!bench_formatter
Formatter Benchmark Results
group main pr
----- ---- --
formatter/checker.ts 1.07 409.4±14.57ms 6.4 MB/sec 1.00 383.7±13.97ms 6.8 MB/sec
formatter/compiler.js 1.00 216.3±4.69ms 4.8 MB/sec 1.05 226.1±5.57ms 4.6 MB/sec
formatter/d3.min.js 1.00 174.1±8.39ms 1541.4 KB/sec 1.07 186.4±6.59ms 1440.2 KB/sec
formatter/dojo.js 1.00 11.2±0.31ms 6.2 MB/sec 1.09 12.2±0.33ms 5.6 MB/sec
formatter/ios.d.ts 1.00 243.3±12.81ms 7.7 MB/sec 1.04 252.1±4.82ms 7.4 MB/sec
formatter/jquery.min.js 1.00 44.9±2.90ms 1884.2 KB/sec 1.09 48.9±2.86ms 1729.9 KB/sec
formatter/math.js 1.00 384.8±23.87ms 1723.1 KB/sec 1.03 395.1±16.40ms 1678.2 KB/sec
formatter/parser.ts 1.03 8.1±0.49ms 6.0 MB/sec 1.00 7.8±0.12ms 6.2 MB/sec
formatter/pixi.min.js 1.00 198.6±7.66ms 2.2 MB/sec 1.05 208.9±5.22ms 2.1 MB/sec
formatter/react-dom.production.min.js 1.00 52.7±2.36ms 2.2 MB/sec 1.12 59.2±2.76ms 1989.7 KB/sec
formatter/react.production.min.js 1.00 2.7±0.02ms 2.3 MB/sec 1.08 2.9±0.01ms 2.1 MB/sec
formatter/router.ts 1.00 5.8±0.04ms 10.6 MB/sec 1.07 6.2±0.03ms 10.0 MB/sec
formatter/tex-chtml-full.js 1.00 483.8±14.34ms 1928.6 KB/sec 1.00 483.3±7.78ms 1930.9 KB/sec
formatter/three.min.js 1.00 227.9±10.33ms 2.6 MB/sec 1.07 244.7±6.65ms 2.4 MB/sec
formatter/typescript.js 1.00 1517.8±51.83ms 6.3 MB/sec 1.04 1583.0±40.92ms 6.0 MB/sec
formatter/vue.global.prod.js 1.00 75.4±4.09ms 1635.8 KB/sec 1.06 80.2±3.65ms 1538.9 KB/sec
The formatter adds more parentheses than necessary to not change precedence. For example, Rome adds parentheses around shift operations to ease readability.
How does this influence the decision while considering other approaches?
The information which nodes need parentheses would become outdated during tree mutations or, it would be necessary to update this information when mutating the tree. Computing this information isn't cheap because it involves many tree navigation (multiple allocations).
Could you give me more information here? In which context an information becomes outdated? Are you referring to a potential visitor approach? Or else?
The knowledge of which expressions need parentheses aren't needed for all use cases of the parser and computing the information during the parse phase would add an unnecessary cost (performance penalty).
What if one day this information will be needed for some other tool, for example a minifier. How hard/complex/easy would be to refactor this piece of logic?
Besides, an over-approximation of which nodes require parentheses is stored in the green tree by having
ParenthesizedExpression. Agreed, the tree may have more parenthesized expressions than necessary, but all the required once are in place.
I didn't understand this bullet point.
There are a few alternatives to this design, some of which I want to explore after I rewrote binary expression formatting:
Pre-compute
needs_parentheses: An option is to pre-computeneeds_parentheseswhen extracting the suppression comments and storing the information on theFormatContext. This would remove the overhead for resolving the parent for every node. I'm doubtful that this will yield a big performance and it increases complexity a bit, because knowing if something needs parentheses then needs aFormatContext. We can try this out as a dedicated PR if we're interested.Remove parenthesized expressions: The idea here is to pre-process the tree to remove all parenthesized expressions (except the ones that have closure type cast comments, or have any skipped token trivia attached to the left or right parens). This will have its own performance overhead but removes the need for calling
resolve_expressionandresolve_expression_parentwhich I see as the main drawback of the current solution. This is something I want to explore after I finished my work on the binary expression formatting. The main challenge is that I'll need to figure out a way to map positions from the transformed tree back to the original tree (source maps?) to be able to print a node in verbatim node if it's formatting (or any child) fails because of a missing child node.
Could you explain a bit, as an overview, how these bullet points are related to the solution of this PR? Does that mean that the solution of this PR is not the final one? Or are they complementary? If these bullet points are real alternatives, how do you plan to actually understand the pros and cons and eventually propose a final algorithm?
Just asking because now, I don't know what I have to review in this PR.
The formatter adds more parentheses than necessary to not change precedence. For example, Rome adds parentheses around shift operations to ease readability.
How does this influence the decision while considering other approaches?
The main difference this makes is that the problem could otherwise be simplified to only remove unnecessary parentheses because it should be safe for the formatter to assume that the input tree is correctly parenthesized. I didn't pursue this option because it doesn't achieve our goal to be prettier compatible.
The information which nodes need parentheses would become outdated during tree mutations or, it would be necessary to update this information when mutating the tree. Computing this information isn't cheap because it involves many tree navigation (multiple allocations).
Could you give me more information here? In which context an information becomes outdated? Are you referring to a potential visitor approach? Or else?
Let's say you start with
let a = await call();
a[member];
await call() doesn't need any parentheses
Now let's do an inline refactor of a
(await call())[member]
It now becomes necessary to parenthesis await call() to form a valid syntax tree, meaning, it would be necessary to update the information if await call() needs parentheses or not.
The knowledge of which expressions need parentheses aren't needed for all use cases of the parser and computing the information during the parse phase would add an unnecessary cost (performance penalty).
What if one day this information will be needed for some other tool, for example a minifier. How hard/complex/easy would be to refactor this piece of logic?
You could split the NeedParentheses trait to have a method that only returns true if parentheses are necessary to avoid syntax errors and another that may also return true for "nicety" parentheses.
Besides, an over-approximation of which nodes require parentheses is stored in the green tree by having
ParenthesizedExpression. Agreed, the tree may have more parenthesized expressions than necessary, but all the required once are in place.I didn't understand this bullet point.
What I'm saying is that ParenthesizedExpression expressions inside of the tree encode where parentheses may be necessary. It may be possible to remove some, but if we keep the same amount of parenthesized expressions, then it's guaranteed that the tree remains valid IF the input tree was valid.
If this doesn't help, can you explain what you didn't understand
There are a few alternatives to this design, some of which I want to explore after I rewrote binary expression formatting:
Pre-compute
needs_parentheses: An option is to pre-computeneeds_parentheseswhen extracting the suppression comments and storing the information on theFormatContext. This would remove the overhead for resolving the parent for every node. I'm doubtful that this will yield a big performance and it increases complexity a bit, because knowing if something needs parentheses then needs aFormatContext. We can try this out as a dedicated PR if we're interested. Remove parenthesized expressions: The idea here is to pre-process the tree to remove all parenthesized expressions (except the ones that have closure type cast comments, or have any skipped token trivia attached to the left or right parens). This will have its own performance overhead but removes the need for callingresolve_expressionandresolve_expression_parentwhich I see as the main drawback of the current solution. This is something I want to explore after I finished my work on the binary expression formatting. The main challenge is that I'll need to figure out a way to map positions from the transformed tree back to the original tree (source maps?) to be able to print a node in verbatim node if it's formatting (or any child) fails because of a missing child node.Could you explain a bit, as an overview, how these bullet points are related to the solution of this PR? Does that mean that the solution of this PR is not the final one? Or are they complementary? If these bullet points are real alternatives, how do you plan to actually understand the pros and cons and eventually propose a final algorithm?
These alternatives would build up on top of this PR. This PR was the most straightforward approach to get something working.
- Pre-compute: uses the same idea but instead of calling
node.needs_parenthesesinside ofFormatNodeRuleit would callcontext.needs_parentheses(node)which would perform aSetlookup. TheSetof nodes that needs parentheses would be computed ahead of time when extracting the suppressions. The only thing that changes is whennode.needs_parenthesesis called. - Remove parenthesized expressions: This will revert the changes to
ExpressionNode.resolvebut the core logic aroundNeedsParentheseswith all the implementations will remain the same. Therefore, this is another incremental change.
If the diverging is intentional and we won't go back to that, could you please add a task/reminder for ourselves to update the website with this decision? We have to update the section where we track the difference section
I plan to merge this PR by tomorrow except there are some major conceptual concerns around it.