oxc icon indicating copy to clipboard operation
oxc copied to clipboard

Rework `oxc_prettier`

Open Boshen opened this issue 1 year ago โ€ข 43 comments

[!NOTE]

@leaysgur is currently examining the following options:

  • 1: Implement the existing Prettier-based code
  • 2: Reuse the Biome code and start from scratch

For option 1, see https://github.com/oxc-project/oxc/issues/5068#issuecomment-2507272735 for the rest of the details and feel free to contribute.

For option 2, stay tuned for more details. ๐Ÿšง

Original description

crates/oxc_prettier was my attempt at the prettier bounty.

I thought I could finish it in time, except the fact that I rushed too quickly without looking at all the requirements ... It was too late when I got blocked by printing comments.

In order to rework oxc_prettier, we need to understand at least:

  • How to debug prettier
  • The Doc IR https://github.com/prettier/prettier/blob/main/commands.md https://github.com/oxc-project/oxc/blob/main/crates/oxc_prettier/src/doc.rs
  • How to handle comments https://github.com/prettier/prettier/blob/main/src/language-js/comments/handle-comments.js
  • How to handle parentheses https://github.com/prettier/prettier/blob/main/src/language-js/needs-parens.js

As for the infrastructure, we already have most of the code:

  • https://github.com/oxc-project/oxc/tree/main/tasks/prettier_conformance shows our js compat is at js compatibility: 264/586 (45.05%). But we need to update prettier to latest make sure this conformance runner still works.
  • All the code in https://github.com/oxc-project/oxc/tree/main/crates/oxc_prettier mimics and ports prettier https://github.com/prettier/prettier/tree/main/src/language-js

Feel free to remove everything and start from scratch, and copy over the format code https://github.com/oxc-project/oxc/tree/main/crates/oxc_prettier/src/format

Boshen avatar Aug 22 '24 10:08 Boshen

I'm sorry to take up some of your time, but I feel confused about the following issues, so please allow me to ask you a few questions... As we konw, the next minor release of ESLint will deprecate core formatting rules. But this also means that the linter itself originally had code formatting capabilities, so why isn't Oxlint directly compatible with plugins like ESLint Stylistic? Are there some corner cases where Oxlint doesn't work as well as Prettier, so Oxlint had to be compatible with ESLint and Prettier separately, or it's just a divide-and-conquer engineering trade-off๏ผŸ I'm guessing one of the reasons for this is that Prettier can support different file formats (not only .js, but also .css and more...), but ESLint or Oxlint can do the same job via plugins. Thanks.

parkin-lin avatar Aug 23 '24 08:08 parkin-lin

We made an intentional decision to only support code correctness rules, not stylistic rules. @Boshen can explain reasoning more.

DonIsaac avatar Aug 26 '24 01:08 DonIsaac

@leaysgur is writing a series of articles in preparation of this task:


I'm also working on comment attachments to unblock prettier.

Boshen avatar Sep 04 '24 14:09 Boshen

I'm also working on comment attachments to unblock prettier.

Does this mean that you aim to implement Prettier equivalent which achieves with 60+ utils? ๐Ÿซจ

FYI: Babel also has relevant code(just 300 loc), but that did not seem to meet Prettier's requirement.

leaysgur avatar Sep 05 '24 03:09 leaysgur

Does this mean that you aim to implement Prettier equivalent which achieves with 60+ utils? ๐Ÿซจ

We need to figure out what exactly is prettier doing with those 60+ utils ๐Ÿฅฒ

Boshen avatar Sep 05 '24 05:09 Boshen

For those who are interested in algorithms under the hood, prettier is based on https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf, https://prettier.io/docs/en/technical-details

IWANABETHATGUY avatar Sep 07 '24 14:09 IWANABETHATGUY

It has been 3 weeks since I started reading the Prettier source code. It's still far from being complete, but I'd like to leave some progress and summary here.

How to debug Prettier

There are 3 ways:

  • Playground
    • Enable "Debug" options in the sidebar
  • CLI
    • With --debug-* args
  • Node.js API
    • Under __debug exports

https://leaysgur.github.io/posts/2024/09/03/111109/

It is written in Japanese, but it is all code, so you can understand it. ๐Ÿ˜‰

I also recommend to run node --inspect-brk the code with debugger and inspect it from Chrome DevTools.

How to handle comments

I will post some topics for discussion in a few days.

leaysgur avatar Sep 21 '24 03:09 leaysgur

How to handle comments

As you may know, Prettier's formatting process consists of roughly 3 phases:

  • P1: original text > AST
  • P2: AST > Doc
  • P3: Doc > formatted text

Comments are collected in P1 and used in P2.

In P1:

  • It simply parses to the AST with comments in the output
    • However, there are also some adjustments made to the AST nodes
    • Some parsers, such as Babel, already attach comments to nodes at this stage
      • However, Prettier does not use them
      • The reason is to support parsers other than Babel
  • As the final step of P1 (more technically, the beginning of P2), comments are classified and attached to nodes
    • First, it starts by finding nearby nodes for each comment
    • Based on that, it determines the placement(ownLine, endOfLine, remaining) from the lines before and after each comment
    • Then, it handles about 30(!) known special patterns for each placement
    • Finally, it finishes using unique tie-breaking algorithm

As a result, some AST nodes have comments property with array of Comment extended with leading, trailing and few more props.

In P2 (I havenโ€™t read the code in detail here yet),

  • When each node is converted into a Doc, comments are also converted into Docs
    • Therefore, how they are output seems to have already been decided in P1

In OXC, part of the necessary information is already implemented and can be obtained. / #5785

However, just like with Babel, that information may be different from what Prettier requires...


So, I think Iโ€™ve generally understood "what" Prettier is doing.

However, as for "why" Prettier does it that way, I can only say itโ€™s because thatโ€™s Prettierโ€™s opinion.

Incidentally, there seem to be at least around 120 issues related to JS/TS at the moment, but

https://github.com/prettier/prettier/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3Alang%3Ajavascript%2Cjsx%2Ctypescript+label%3Atype%3Abug

about 50 of them are related to comments, with some remaining unresolved since as far back as 2017.

https://github.com/prettier/prettier/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3Alang%3Ajavascript%2Cjsx%2Ctypescript+label%3Atype%3Abug+label%3Aarea%3Acomments

leaysgur avatar Sep 25 '24 02:09 leaysgur

So, what I would like to discuss(confirm?) is: where should oxc_prettier aim to go? As the name suggests, should it strive for 100% compatibility with Prettier?

If the goal is to achieve compatibility, I think it would mean porting almost all of Prettierโ€™s lines.

  • Prettierโ€™s coverage is impressively high at 99%.
    • https://app.codecov.io/gh/prettier/prettier/tree/main/src
  • Since we donโ€™t use CST like Biome, many of them cannot be omitted
  • We would also end up porting the same issues I mentioned above

I believe the original aim of this issue was this, but again, is this an acceptable?

If we decide to proceed, letโ€™s consider how we might specifically approach it...

Fortunately, the 3 phases have isolated inputs and outputs, so:

  • If we have AST + Comments (in .json), we can create a Doc
  • If we have a Doc (in .json), we can generate formatted text

I think reducing the gaps between each phase will be important.

For that, weโ€™ll need some way to directly compare the result of Prettier.parse() with our implementation.

  • For example, we could resolve the issue of making our AST ESTree-compatible first?
    • Fortunately, we know Prettier can run without line|column
      • https://gist.github.com/leaysgur/411636c61f20558ced9e08a394dc7584
  • Alternatively, complete the experiment of aligning with Babel AST?
    • https://github.com/leaysgur/prettier?tab=readme-ov-file
    • Prettier supports not only ESTree but also Babel AST directly, and OXC AST has many similarities with Babel AST
// OXC
let oxcASTWithComments = oxc_prettier::parse(text);
let oxcEstreeASTWithComments = estreeJSON(oxcASTWithComments);
// let oxcBabelASTWithComments = babelJSON(oxcASTWithComments);

// Prettier
let prettierEstreeASTWithComments = Prettier.parse(text, { parser: "meriyah" });
// let prettierBabelASTWithComments = Prettier.parse(text, { parser: "babel" });

// Diff!
assert(oxcEstreeASTWithComments, prettierEstreeASTWithComments);

What if we donโ€™t aim for full compatibility?

  • For example, we could continue to use Prettier just as a reference?
    • But handle comments in our own way without focusing on compatibility
    • In my opinion as a user, it would be fine to simply restore comments as close to their original positions as possible without special handling
      • (More personally, I also dislike when parentheses are removed without permission)

But in that case, maybe it would be more like oxformat rather than oxc_prettier?

Anyway, this is what I was thinking these days. What do you think?

leaysgur avatar Sep 25 '24 02:09 leaysgur

For the long run, I envision a more configurable and less opinionated formatter.

Under this assumption, we can relax the "100% compatibility" constraint.

My intention is to reach a high compatibility with prettier so we can bootstrap ourself, and then start deviating behavior.

I've also looked at the prettier and rustfmt issues before, comment positioning is a really difficult subject due to the natural of matching the original intent of the comment.

To move things forward, I suggest @leaysgur to start refactoring our prettier code in preparation for what to come, I believe you know more about prettier than I do. You may ask @Sysix for help given they have started working on this area as well.

Our code for P2 and P3 is also incomplete, I suggest to do a little bit of rework first, as well as adding more debug utilities.

Boshen avatar Sep 25 '24 02:09 Boshen

I think reducing the gaps between each phase will be important. For that, weโ€™ll need some way to directly compare the result of Prettier.parse() with our implementation.

I don't think we need to do this. This will leave us too coupled with the prettier implementation, and it may end up being a waste of effort.

We are already matching half of the generated text, a few more iterations of refactoring and implementing details should close our gap to prettier.

Boshen avatar Sep 25 '24 03:09 Boshen

I see, what I likely concerned about was: what exactly is meant by "high compatibility". ๐Ÿ˜… Does it have to be 100%, or should aim for at least 80% if possible, or is something around 70% good enough? etc...

So, It's good to know that we're not necessarily aiming for 100% compatibility.

It's still uncertain how much the percentage will drop if we give up on porting completeness for comment handling, but for now, I'll move forward.

Following your suggestion:

  • Understanding and refactoring the oxc_prettier
  • Continuing with code reading of Prettier from P2

Iโ€™ll work on these whenever I have time! ๐Ÿš€


For the long run, I envision a more configurable and less opinionated formatter.

I love this idea!

leaysgur avatar Sep 25 '24 07:09 leaysgur

For the long run, I envision a more configurable and less opinionated formatter.

Does this mean that oxc_prettier will provide a wide range of configurable options, and offer a preset called prettier to match with prettier?

magic-akari avatar Sep 25 '24 07:09 magic-akari

How to handle comments

(Follow up of https://github.com/oxc-project/oxc/issues/5068#issuecomment-2372777159)

As I posted above, comments are collected and attached to AST nodes in P1. Then in P2, comments are printed as Doc along with other AST nodes.

Most comments are printed with their attached nodes like:

[leadingCommentsDoc, nodeDoc]
// or
[nodeDoc, trailingCommentsDoc]

https://github.com/prettier/prettier/blob/e3b8a16c7fa247db02c483fb86fc2049d45763b8/src/main/ast-to-doc.js#L128-L138

But the rest of the comments are handled as needed.

  • Dangling comments
  • Not dangling(leading or trailing) but need special treatment

There are about 40 files for printing ESTree AST to Doc.

https://github.com/prettier/prettier/tree/main/src/language-js/print

And 15 files of them print comments on demand.

โฏ rg 'print(Dangling)?Comments(Separately)?' -l src/language-js/print
estree.js
class.js
type-annotation.js
function-parameters.js
mapped-type.js
component.js
module.js
function.js
ternary.js
array.js
binaryish.js
property.js
call-arguments.js
block.js
jsx.js
arrow-function.js
object.js
member-chain.js
type-parameters.js

leaysgur avatar Sep 30 '24 05:09 leaysgur

@leaysgur I was just informed that biome_formatter is a port of the Prettier IR and printer infrastructure. It may be possible to reduce repeated work by using biome_formatter. More investigation for the rework!

So instead of using our current unfinished IR and printer, we can replace them with biome_formatter.

I pinged you on discord, for full context.

Boshen avatar Oct 01 '24 16:10 Boshen

Thanks for the pinning!

biome_formatter is a port of the Prettier IR and printer infrastructure.

Yes, and I just started reading through biome_formatter and biome_js_formatter, so the timing is perfect. ๐Ÿ˜„

I hadnโ€™t considered it as a viable option for OXC (though Iโ€™m not entirely sure why), but if we shift our goal to generating biome_formatter's IR, it could significantly reduce the amount of work, maybe?

Iโ€™ll look into this soon (after finishing the regex parser rework), including:

  • General architecture: https://github.com/biomejs/biome/blob/main/crates/biome_formatter/CONTRIBUTING.md
  • How their CST(especially comments) is used
  • biome_formatter IR, aka format_element
  • etc...

leaysgur avatar Oct 02 '24 00:10 leaysgur

but if we shift our goal to generating biome_formatter's IR

biome_formatter IR, aka format_element

I think it's the same thing as prettiers IR. Also Doc in our code.

Boshen avatar Oct 02 '24 01:10 Boshen

Hey all, just dropping by!

My intention is to reach a high compatibility with prettier so we can bootstrap ourself, and then start deviating behavior.

I feel adding support for comments to the current implementation should be feasible. In fact, I suspect it'll give us a significant boost on the prettier conformance test suite from the current 39%. We don't have to follow prettier's implementation exactly, but we can do something close to it.

Since the oxc_ast's Program node already stores a pub comments: Vec<'a, Comment> field, we could just repurpose it to add a third pass over the Doc objects (before the Commands are generated)? I think it'd end up looking similar to what @leaysgur showed here.

I could do a PoC on my fork and have it up in some time. Would any of you be interested in taking a look, perhaps?


I don't know how much more work the parenthesis bit will take, but my hunch is that emitting biome IR might be more work than supporting comments in oxc_prettier's current form. Of course, people who've worked on this will know better. Curious to know what you think!

srijan-paul avatar Oct 15 '24 18:10 srijan-paul

I want to share my overall idea from end user perspective: After oxc_prettier will be created I suggest to create something like eslint-plugin-oxc-prettier in future for using oxlint with that plugin to show problems in IDE / console and fix it as eslint-plugin-prettier does.

Currently my lint CI time is 50% time for prettier eslint (too much).

pumano avatar Oct 16 '24 10:10 pumano

you might want to consider running prettier (or oxc_prettier later) separately: https://www.joshuakgoldberg.com/blog/you-probably-dont-need-eslint-config-prettier-or-eslint-plugin-prettier/

YurySolovyov avatar Oct 16 '24 10:10 YurySolovyov

Recently, I've been reading and researching the code related to Biome's Formatter. I'd like to note some of my learnings and progress here.

biome_formatter and biome_xxx_formatter

In the Biome repo, the biome_formatter crate is used as infrastructure to implement Formatters for each language.

According to CONTRIBUTING.md, the implementation on each language side(e.g. for JS) is as follows:

  • Use the biome_rowan crate to define AST/CST nodes as biome_js_syntax
    • Also used as the parsing result of biome_js_parser
  • In the biome_js_formatter crate:
    • Implement the FormatRule trait (compatible with the format! macro) for all those nodes
    • Implement JsFormatLanguage that implements the FormatLanguage trait
      • Contains information like context and handling of comments
  • Then, call biome_formatter::format_node(parsed_tree, format_language)
    • Performs preprocessing on the AST/CST, finds attached comments, and converts to IR
    • By calling print() on the return value, you get the formatted string

If I understand correctly, to fully benefit from biome_formatter, we need the AST/CST defined using biome_rowan.

In the Biome repo, that's natural, but since we want to use OXC's AST, we cannot choose this way.

biome_formatter::Printer

As another way to utilize the biome_formatter, using only its IR and Printer may be possible.

A concern is that the IR and its builders also depend on biome_rowan. Possibly, some of them may not be usable, and in the worst case, there may be no alternative.

Due to my lack of my Rust experience, I couldn't make this judgment from the code and types alone, I think I need to write some code to verify it. I might find problems in more fundamental parts...๐Ÿฅฒ

Also in this case, the formatting options are limited to those defined in PrinterOptions.

  • indent_width
  • print_width
  • line_ending
  • indent_style
  • attribute_position
  • bracket_spacing

https://github.com/biomejs/biome/blob/main/crates/biome_formatter/src/printer/printer_options/mod.rs#L8

If we want to perform more flexible formatting, we'll have to fork it.

Biome's IR

Biome's IR is called FormatElement, and kinds are:

  • Space
  • HardSpace
  • Line
  • ExpandParent
  • StaticText
  • DynamicText
  • LocatedTokenText
  • LineSuffixBoundary
  • Interned
  • BestFitting
  • Tag

https://github.com/biomejs/biome/blob/main/crates/biome_formatter/src/format_element.rs#L19

They are similar but not the same as Prettier's Doc. Biome also has a struct called Document, but this just holds a Vec<FormatElement>.

https://github.com/biomejs/biome/blob/main/crates/biome_formatter/src/format_element/document.rs#L17

Although the final formatting result is compatible with Prettier, it's better to think of the intermediate processes are entirely different(more sophisticated).

Formatting flow

Basically proceeds in the same way as Prettier.

  • Parse the code string into an AST/CST
  • Preprocess the parsed tree
    • Normalize () and logical expressions, just like Prettier
  • Collect comments
    • Checking line spacing between comments and nodes, as in Prettier
    • Determining the placement of comments also requires heavy conditional branching
  • Traverse the tree and convert it into Vec<FormatElement>
    • Each node knows its own formatting rules
    • Comments are converted similarly
  • Convert Document::from(format_elements) into a code string

A part different from Prettier is the determination of whether () are necessary. In the case of Biome, it seems that information is held on the node definition side of biome_js_syntax.

https://github.com/biomejs/biome/tree/main/crates/biome_js_syntax/src/parentheses

As for comments, it seems a lot of code is needed, with both biome_js_formatter and biome_formatter having over 1,000 lines of code.

If we declare Prettier compatibility, it means our codebase will also need a corresponding amount of code...


As the next step, I'm planning to write some code to see whether using only Printer is feasible.

I'm also thinking, "Should we aim for Prettier-compatible comment formatting by creating/maintaining all that code again?" ... but I'm not sure.๐Ÿ˜Œ

leaysgur avatar Nov 12 '24 02:11 leaysgur

Progress: An experiment to see if we can use just the Printer from biome_formatter

Based on biome_formatter::format_node(), here's a minimal code example that reproduces a similar flow.

https://gist.github.com/leaysgur/de62398266ef98edcf7a0e3c06325538

My personal impression is that it doesn't seem impossible, but I'm not yet sure if it's the right path. ๐Ÿ˜…

Here's what I've found out:

  • To use Printer, create it with Formatted::new(document, context) and then call print()
    • A Document can be created from a Vec<FormatElement>
  • In the original code, document.propagate_expand() is called before creating Formatted
    • However, this propagate_expand() is pub(crate), so it can't be used for us(โ—๏ธ)
    • It seems it would work if you copy it entirely and adjust the types, but...

and,

  • FormatElement is constructed using helpers under builders::*
  • Regarding text printing, there are 3 types of IR and helpers:
    • StaticText: Accepts &'a static only
    • DynamicText: Accepts &str, but requires a starting position by biome_text_size::TextSize
    • LocatedTokenText: Requires a biome_rowan::SyntaxToken
  • In OXC's AST, tokens can only be generated using StaticText
    • Because there's no AST corresponding to new in new Foo()
  • Dynamic parts are often represented with Atom<'a>, so they can't be &'a static
    • Of course, they can't be SyntaxToken either
    • That means we have to use DynamicText, but for that, we need to provide a TextSize
    • While the start of Span fits the type, internal behavior and policy differences may affect things(โ—๏ธ)
      • In the first place, I'm not sure what is the sourcemap used for in the Formatter

I mentioned "minimal" at the beginning because this Gist does not use the Format trait, which is center of the biome_formatter.

  • We cannot directly implement the trait from the biome_formatter crate in the oxc_ast crate
    • Due to Rust's orphan rule
  • To cope with this, it requires a lot of boilerplate code
    • While it's understandable that Biome needs these as common infrastructure, but how about for us?
  • To refer to Biome's code and use other helpers, it might be better to implement it
    • There are parts we can avoid even with format_with(), but still

Next, I will check this part. I don't know whether it's possible or not in the first place, though. ๐Ÿ™„

leaysgur avatar Nov 18 '24 07:11 leaysgur

@leaysgur Thank you so much for the research. I think it's enough evidence to just implement our own thing (with some parts ported from Biome)?

Boshen avatar Nov 18 '24 08:11 Boshen

to just implement our own thing

Yes..., I also think that's probably the case.

And if we choose to go our own way, even if there are parts we can refer to, I feel there won't be any parts where we can borrow as code. ๐Ÿ˜“

Go back to the square one, I will re-start by understanding/refactoring the current code, and then think about what to do next.

leaysgur avatar Nov 18 '24 12:11 leaysgur

@leaysgur will lead this project, and is free to make any changes to the oxc_prettier crate.

Boshen avatar Nov 25 '24 03:11 Boshen

Thanks~!

Progress is likely to be slow due to limited resources, but I will try to update progress every 2-3 days or so at the latest comment on this issue.

leaysgur avatar Nov 26 '24 02:11 leaysgur

List of Prettier's print implementations and exports:

https://github.com/prettier/prettier/tree/3.4.1/src/language-js/print

  • array
    • isConciselyPrintedArray
    • printArray
  • arrow-function
    • printArrowFunction
  • assignment
    • isArrowFunctionVariableDeclarator
    • printAssignment
    • printAssignmentExpression
    • printVariableDeclarator
  • binaryish
    • printBinaryishExpression
    • shouldInlineLogicalExpression
  • block
    • printBlock
  • call-arguments(utils)
    • printCallArguments
  • call-expression
    • printCallExpression
  • class
    • printClass
    • printClassBody
    • printClassMethod
    • printClassProperty
  • decorators
    • printClassMemberDecorators
    • printDecorators
  • expression-statement
    • printExpressionStatement
  • function-parameters
    • printFunctionParamteres
    • shouldBreakFunctionParameters
    • shouldGroupFunctionParameters
    • shouldHugTheOnlyFunctionParameter
  • function
    • printFunction
    • printMethod
    • printMethodValue
    • printReturnStatement
    • printReturnType
    • printThrowStatement
    • shouldPrintParamsWithoutParens
  • literal
    • printLiteral
  • member-chain(utils)
    • printMemberChain
  • member
    • printMemberExpression
    • printMemberLookup
  • module
    • printExportDeclaration
    • printImportDeclaration
    • printImportKind
    • printModuleSpecifier
  • object
    • printObject
  • property
    • printProperty
    • printPropertyKey
  • semicolon
    • shouldPrintLeadingSemicolon
  • statement
    • printStatementSequence
  • template-literal
    • printTaggedTemplateLiteral
    • printTemplateLiteral
  • ternary
    • printTernary
  • misc
    • adjustClause
    • printAbstractToken
    • printBindExpressionCallee
    • printDeclareToken
    • printDefiniteToken
    • printFunctionTypeParameters
    • printOptionalToken
    • printRestSpread
    • printTypeScriptAccessibilityToken
  • ==========================
  • jsx
    • printJsx
  • ==========================
  • interface(ts)
    • printInterface
  • enum(ts)
    • printEnumDeclaration
    • printEnumMember
  • mapped-type(ts)
    • printTypescriptMappedType
    • printTypeScriptMappedTypeModifier
  • type-annotation(ts)
    • printArrayType
    • printFunctionType
    • printIndexedAccessType
    • printInferType
    • printIntersectionType
    • printJSDocType
    • printNamedTupleMember
    • printRestType
    • printTypeAlias
    • printTypeAnnotation
    • printTypeAnnotationProperty
    • printTypePredicate
    • printTypeQuery
    • printUnionType
    • shouldHugType
  • type-parameters(ts)
    • getTypeParametersGroupId
    • printTypeParameter
    • printTypeParameters

I don't think we need a 1:1 counterpart for these, but aligning them makes it easier to track our progress.


Next, I'll check our current implementation. Fully or partially implemented? not yet covered? etc...

leaysgur avatar Nov 28 '24 06:11 leaysgur

Next, I'll check our current implementation. Fully or partially implemented? not yet covered?

You can assume all are partially implemented ๐Ÿ˜…

Cross referencing the function in our code to the prettier counter part would definitely make things easier to follow.

Boshen avatar Nov 28 '24 07:11 Boshen

Notes and TODOs to complete Prettier's behavior as far as I know...

Fundamental part

  • Handling ( )
    • Currently, in the usage of oxc_parser, preserve_parens is set to false, so there is no ParenthesizedExpression
    • As a result, features like /** @type {A|null} */ (null) comments do not work, and the code gets broken
    • It is necessary to revert preserve_parens to true for the first step
    • After reverting to preserve_parens: true, manually remove the redundant ParenthesizedExpression
      • NOTE: The extra.parenthesized property seem to be used for the |> operator, so it may not be necessary
      • NOTE: Check oxc_minifier as a reference doing simillar things
    • See also https://github.com/oxc-project/oxc/issues/5068#issuecomment-2520317379
  • AST postprocess
    • In addition to parens, it is needed to rebalance LogicalExpression
    • Since other nodes are also being modified, follow up as necessary
    • NOTE: It appears that oxc_semantic is required alongside oxc_traverse?
      • In that case, it may be possible to replace parent nodes management code manually written for now?
  • need_parens logic
    • Likely there are bugs, so the code needs review again
    • Since only limited nodes are related to this, it might be possible to define it for each node individually like Biome
    • And this is called by formatting process like utils, but now it feels like assuming calls are made only through wrap!
  • Comments
    • Currently, no comments are being output at all
    • Let's classify the comments first
    • NOTE: Check oxc_codegen as a reference
    • See also:
      • https://github.com/oxc-project/oxc/issues/5068#issuecomment-2372777159
      • https://github.com/oxc-project/oxc/issues/5068#issuecomment-2382171971
  • wrap! macro
    • Perhaps rename it like format! and ensure it is used on every AST node?
      • For now, not all nodes are using it
      • There seem to be nodes which are not listed as an AstKind(unclear if intentional)
    • If the introduction of oxc_semantic can eliminate the parent-child tracking, just using Format trait may be enough
  • Doc(IR)
    • Align and Label are not yet implemented
    • Update both the Printer and the macro as well
  • Printer's propagate_breaks
    • As noted in the code comments, there are rare cases where it is not working as expected
  • Unify line breaks
    • Beforehand, unify to LF and later replace according to the options
  • Cursor
    • When using from the CLI, it is necessary to return the editor's cursor position back
    • Currently, CLI usage is not even considered

Formatting part

Below are the current implementation differences with Prettier.

https://github.com/prettier/prettier/blob/3.5.0/src/language-js/print

NOTE: For now, the target is 3.5.0, but as the Prettier codebase is constantly refactored, there might be parts where the code can be simplified.

JS / AST Entry points

St. AST Nodes(Prettier) AST Nodes(OXC): Blank means the same
โœ… RegExpLiteral
โœ… BigIntLiteral
โœ… NumericLiteral
โœ… StringLiteral
โœ… NullLiteral
โœ… BooleanLiteral
โœ… Directive
โœ… TemplateElement
โœ… TemplateLiteral
โœ… TaggedTemplateExpression
โœ… Program, BlockStatement, StaticBlock BlockStatement = BlockStatement + FunctionBody
โœ… ChainExpression
โœ… ParenthesizedExpression
โœ… AssignmentExpression
โœ… VariableDeclaration
โœ… VariableDeclarator
๐Ÿšง BinaryExpression, LogicalExpression BinaryExpression = BinaryExpression + PrivateInExpression
โœ… AssignmentPattern
โœ… MemberExpression = ComputedMemberExpression + (StaticMember|PrivateField)Expression
โœ… MetaProperty
โœ… Identifier = IdentifierReference + IdentifierName + BindingIdentifier + LabeledIdentifier
โœ… PrivateIdentifier
โœ… V8IntrinsicIdentifier = V8IntrinsicExpression(name + arguments)
โœ… FunctionDeclaration, FunctionExpression = Function(+ FunctionBody + FormalParameter(s))
โœ… ArrowFunctionExpression + FunctionBody + FormalParameter(s)
โœ… YieldExpression
๐Ÿšง AwaitExpression
โœ… NewExpression, ImportExpression, OptionalCallExpression, CallExpression OptionalCallExpression = CallExpression w/ optional: true
โœ… ObjectExpression, ObjectPattern + ObjectAssignmentTarget
๐Ÿšง ObjectProperty + BindingProperty + AssignmentTargetPropertyProperty + AssignmentTargetPropertyIdentifier
โœ… ObjectMethod = ObjectProperty w/ (method|get|set): true
โœ… ArrayExpression, ArrayPattern + ArrayAssignmentTarget
โœ… SpreadElement, SpreadPropertyPattern, RestElement + BindingRestElement + AssignmentTargetRest
โœ… SequenceExpression
โœ… UnaryExpression
โœ… UpdateExpression
โœ… ConditionalExpression
โœ… ThisExpression
โœ… DebuggerStatement
โœ… EmptyStatement
โœ… ThrowStatement
โœ… ReturnStatement
โœ… ExpressionStatement
โœ… WithStatement
โœ… IfStatement
โœ… ForStatement + ForStatementInit
โœ… ForInStatement + ForStatementLeft
โœ… ForOfStatement + ForStatementLeft
โœ… WhileStatement
โœ… DoWhileStatement
โœ… BreakStatement, ContinueStatement
โœ… LabeledStatement
โœ… TryStatement
โœ… CatchClause
โœ… SwitchStatement
โœ… SwitchCase
โœ… ClassDeclaration, ClassExpression = Class
โœ… ClassBody
โœ… ClassMethod, ClassPrivateMethod, MethodDefinition Class(Private)Method = MethodDefinition
โœ… ClassProperty, PropertyDefinition, ClassPrivateProperty, ClassAccessorProperty, AccessorProperty Class(Private)Property = PropertyDefinition, ClassAccessorProperty = AccessorProperty
โœ… Super
โœ… ImportDeclaration
โœ… ImportSpecifier, ImportNamespaceSpecifier, ImportDefaultSpecifier
โœ… ImportAttribute + WithClause
โœ… ExportDefaultDeclaration, ExportNamedDeclaration, ExportAllDeclaration
โœ… ExportSpecifier, ExportNamespaceSpecifier

๐Ÿšง: Verified at least once, but still WIP / โœ…: Completed(exc. comments)

Notes for ๐Ÿšง items:

  • (Binary|Logical)Expression
    • Apply print_binary_expression() to PrivateInExpression or keep separated?
    • Move in_parentheses() into print_binary_expression()?
  • AwaitExpression
    • Need to implement startsWithNoLookaheadToken(node, (node) => bool) equivalent
    • need_parens.rs has the similiar implementation
  • ObjectProperty
    • Apply print_assignment() to AssignmentTargetPropertyProperty and BindingProperty or not needed?

JS / Printer APIs

St. File Function Notes
โœ… array printArray
โœ… isConciselyPrintedArray
๐Ÿšง arrow-function printArrowFunction
๐Ÿšง assignment printAssignment
โœ… printAssignmentExpression We inline this in AssignmentExpression(only uses)
โœ… printVariableDeclarator We inline this in VariableDeclarator(only uses)
โœ… isArrowFunctionVariableDeclarator (Used in printTypeParameters() in Prettier)
๐Ÿšง binaryish printBinaryishExpression
โœ… shouldInlineLogicalExpression
โœ… block printBlock We use print_block_body() for Program
๐Ÿšง call-expression printCallExpression = print_call_expression() + print_import_expression()
Exports is_commons_js_or_amd_call() but NOT in Prettier
๐Ÿšง class printClass
โœ… printClassBody
๐Ÿšง printClassMethod
๐Ÿšง printClassProperty
โœ… expression-statement printExpressionStatement
๐Ÿšง function printFunction
โœ… printMethod = print_class_method() + print_object_method()
๐Ÿšง printMethodValue (Used for TSEmptyBodyFunctionExpression in Prettier)
๐Ÿšง printReturnStatement = print_return_or_throw_argument()
๐Ÿšง printThrowStatement = print_return_or_throw_argument()
๐Ÿ‘ป printReturnType
โœ… shouldPrintParamsWithoutParens This has moved to arrow_function.rs
๐Ÿšง function-parameters printFunctionParamteres
๐Ÿ‘ป shouldBreakFunctionParameters
๐Ÿšง shouldGroupFunctionParameters
โœ… shouldHugTheOnlyFunctionParameter
โœ… literal printLiteral We export method for each node like print_string()
๐Ÿšง member printMemberExpression
โœ… printMemberLookup
โœ… module printExportDeclaration should_omit_semicolon() may be missing something...?
โœ… printImportDeclaration
โœ… printImportKind We inline this in print_import_declaration()
โœ… printModuleSpecifier We inline this in (Export|Import)Specifier, Import(Namespace|Default)Specifier
โœ… object printObject
โœ… property printProperty We inline this in ImportAttribute and (Object|AssignmentTargetProperty|Binding)Property
๐Ÿšง printPropertyKey
๐Ÿšง semicolon shouldPrintLeadingSemicolon We inline this in expression_statement.rs(only uses)
โœ… statement printStatementSequence
๐Ÿšง template-literal printTemplateLiteral
๐Ÿšง printTaggedTemplateLiteral
๐Ÿšง ternary printTernary
๐Ÿšง call-arguments(utils) printCallArguments = print_call_arguments() + print_import_source_and_arguments()
๐Ÿ‘ป member-chain(utils) printMemberChain
โœ… misc adjustClause
๐Ÿ‘ป printAbstractToken
๐Ÿ‘ป printBindExpressionCallee
๐Ÿ‘ป printDeclareToken
๐Ÿ‘ป printDefiniteToken
๐Ÿ‘ป printFunctionTypeParameters
๐Ÿ‘ป printOptionalToken
โœ… printRestSpread We inline this in SpreadElement, AssignmentTargetRest, and BindingRestElement
๐Ÿ‘ป printTypeScriptAccessibilityToken

๐Ÿ‘ป: Func not exists / ๐Ÿšง: Verified at least once, but still WIP / โœ…: Completed(exc. comments)

JSX

๐Ÿšง At first glance, the most missing parts are the comments, as always!

TS

TBD: Later... ๐Ÿฅฒ

leaysgur avatar Nov 29 '24 08:11 leaysgur

How to handle parentheses

The basic flow is:

  • When parsing the AST, ParenthesizedExpression is generally removed
  • However, not all of them are removed
  • There are cases where they should be preserved, such as JSDoc casts like /** @type {Type} */ (x)
    • In the first place, when using a parser compliant with ESTree, ParenthesizedExpression does not exist
    • Therefore, even in Prettier, the behavior changes when using parsers other than Babel...
      • https://github.com/prettier/prettier/issues/8171
  • Then, when converting to IR, re-add () if necessary
  • This is determined by needs_parens().
    • https://github.com/prettier/prettier/blob/3.4.2/src/language-js/needs-parens.js

The current issues in oxc_prettier:

1: It assumes specifying preserve_parens: false as an option for oxc_parser.

However, as mentioned above, this cannot keep the () that should be preserved.

2: needs_parens() check is done in the wrap! macro, but it is not applied to all AST nodes.

Basically it can be implemented for each AST node with default: false like Biome does, but...

And for wrap! macro, I think collecting nodes and printing comments+parens can be decoupled. I will revisit this later.

leaysgur avatar Dec 05 '24 13:12 leaysgur