tools icon indicating copy to clipboard operation
tools copied to clipboard

Comment formatting - stability issues

Open MichaReiser opened this issue 3 years ago • 0 comments

Issue 1: Formatting comments with the most inner node

It took me a while to figure out why Rome required the former split_trivia helper (now moved inside the GroupElementsBuffer) that moves leading comments outside of groups.

// test 
[a]

// IR (simplified)
group_elements(
	comment("// test"),
  token("["),
  token("a"),
  token("]")
)

// Becomes
comment("// test"),
group_elements(
  token("["),
  token("a"),
  token("]")
)

Or why is it that Rome needs a custom format_parenthesize helper wheras adding parentheses just seemed to work for Prettier without any special handling.

The important difference is that Prettier associates comments with the outermost node. Rome, on the other hand, formats the comments with the token, or, in other words, formats the comments with the innermost node. Let’s take a look at the example. The AST for [a] can roughly be represented as

- ExpressionStatement
  - ArrayExpression
    - '[' 
      - leading: "// test"
       - IdentifierExpression (details omitted for brevity)
    - ']'

Prettier associates the // test comment as a leading comment of the ExpressionStatement, Rome as a leading comment of the [ token (or, you could say, the leading comment of the ArrayExpression)

This distinction is important because it “automatically” solves the need for split_trivia because Prettier formats the // test comment as part of the ExpressionStatement that is outside of the group_elements and, thus, it isn’t necessary to manually “patch” up the IR to move comments out of groups.

There’s another case where our split_trivia falls short today. The problem is that moving leading comments isn’t sufficient in the following case:

a?.baz // test

// IR
token("a"),
group_elements(
  indent(
     soft_line_break(),
     token("?."),
		 token("baz")
		 line_suffix(comment("// test")),
		 expand_parent()
	)
)

// Result Format 1
a
	?.baz; // test

// Result Format 2
a?.baz; // test

The line suffix comment forces the enclosing group to expand to ensure the line suffix doesn’t travel too far and stays close to its original position. However, it shouldn’t break the ?.baz group but instead its enclosing group (there’s none in this example). Associating the // test comment with the enclosing ExpressionStatement instead ensures that the comment and expand_parent are outside of the group.

It, interestingly, is also the case that this happens to remove the need for custom comment handling for format_parenthesize because, again, Prettier formats the leading and trailing trivia as part of the enclosing ExpressionStatement and not as part of the “to be parenthesized” expression.

Issue 2: node-based comment handling

Rome uses format_delimited to push comments after {, (, or [ onto a new line because

function a {
  // test
}

looks better than (because the comment most likely describes the body of the function and not the {

function a { // test 
}

Rome uses the custom format_delimited, format_inserted, format_removed, format_replaced, and format_only_if_breaks, helpers together with the special indent handling in the Printer to achieve this which works reasonably well.

https://github.com/rome/tools/blob/acd0fc8be9affacc9e93607a095a3fb65ecb0f87/crates/rome_formatter/src/printer/mod.rs#L314-L325

This logic inside of the Printer pushes a line break whenever the indent increased. This happens for example in the following case:

class Test // comment 
{ }

The // comment gets pushed right after the { token where the indent is now 1 instead of zero. The result is that Rome correctly formats this as

class Test {
	// comment
}

but it doesn't work as intended if the comment itself is part of an indented block:

interface A4 // comment1
	extends Base  // comment2
{
	foo(): bar;
}

gets formatted as

interface A4 // comment1
	extends Base { // comment2
	foo(): bar;
}

Notice, how there's no line break before // comment2 because the indent of the line suffix comment is 1 as it's formatted as part of the extends and so is it for the {

Prettier, at the other hand, uses a somewhat extensive list of rules to categorise the leading, trailing, and dangling comments for every node before formatting. The benefit of this approach is that it removes any special comment handling during formatting because the rule can simply define that any trailing comment of the { becomes the leading comment of the block’s first statement.

The main benefit of this approach is that Prettier has a default rule on how to associate comments but it can be overridden to better place comments on a per node level which provides a flexible way to improve comment formatting in the future without the need that it’s perfect from the beginning.

Tasks

  • [ ] Extract the comments and store them as a Rc<Comments> on JsFormatContext. The Rc is necessary so that the comments() function returns a &'static Comments that has a different lifetime than f and, therefore, it is possible to call write!(f, [comments]) while borrowing the comments.
  • [ ] Format the leading/trailing trivia at a node boundary as part of the most outer node.
  • [ ] Fix suppression in specs/prettier/js/babel-plugins/optional-chaining.js

MichaReiser avatar Jun 23 '22 11:06 MichaReiser