oxc formatter: infra to detect no code removal

See TODOs 👇🏻 https://github.com/oxc-project/oxc/issues/13720#issuecomment-3466314647

I saw a few posts mentioning Biome dropping code causing frustration.

An infra for detecting such cases is a must.

Prettier has a runtime check to detect no comments removal.

We should add this infra to:

conformance suite
monitor-oxc
possibly CLI as well for debugging purposes

Sep 12 '25 08:09 Boshen

NOTE: Currently, it hasn't been ported yet, but the original biome_formatter code includes debug runtime mechanisms like:

state.assert_formatted_all_tokens()
comments.assert_formatted_all_comments()

which appear to address this issue.

https://github.com/biomejs/biome/blob/3eeb4dfe0ea8144b6c40580bb065f75558f77285/crates/biome_formatter/src/lib.rs#L1485-L1491

Oct 07 '25 08:10 leaysgur

NOTE: Currently, it hasn't been ported yet, but the original biome_formatter code includes debug runtime mechanisms like:

state.assert_formatted_all_tokens()

comments.assert_formatted_all_comments()

which appear to address this issue.

https://github.com/biomejs/biome/blob/3eeb4dfe0ea8144b6c40580bb065f75558f77285/crates/biome_formatter/src/lib.rs#L1485-L1491

These ensure all AST nodes and comments are formatted.

I'm skeptical though, people reported missing code and idempotency issues regardless.

Oct 07 '25 09:10 Boshen

I'm skeptical though, people reported missing code and idempotency issues regardless.

Oh, I see. Let me start by investigating what kind of cases were occurring.

I did some deep research on Claude, but while I found a few resolved issues, I couldn't find any unresolved ones.

The currently open issues should be addressed, though.

https://github.com/biomejs/biome/issues?q=sort%3Aupdated-desc%20is%3Aissue%20is%3Aopen%20label%3AA-Formatter%20label%3AL-JavaScript

Oct 07 '25 09:10 leaysgur

I'm wondering adding test like this in monitor-oxc:

fn test(source_text) {
  let program1 = parse(source_text);
  let count1 = count_all_ast_nodes_and_comments(program1);

  let source_text2 = format(program1);

  let program2 = parse(source_text2);
  let count2 = count_all_ast_nodes_and_comments(program2);

  assert_eq!(count1, count2);
}

NOTE:

Use ast_kind.debug_name()
Ignore EmptyStatement, ParenthesizedExpression, etc?
For { "a": 1 } -> { a: 1 }, StringLiteral will be IdentifierName

Oct 08 '25 08:10 leaysgur

As far as I tested locally in https://github.com/oxc-project/monitor-oxc/pull/116, it looks fine.

One concern is that nodes are being added and removed more frequently than I anticipated.

// Ignore safely
;[]; // EmptyStatement + ExprStatement
// ↓
[]; // ExprStatement

// Ignore if `t.trim().is_empty()` is safe (basically)
[
  // JSXText(\n) + JSXExprContainer + JSXText(\n)
  <P>
    {children}
  </P>,
  // ↓
  <P>{children}</P>, // JSXExprContainer
];

It's fine.

// Count `key.value` and check the diff
[
  { a: 1 }, // IdentifierName
  // ↑↓
  { "a": 1 }, // StringLiteral

  { 1: "a" }, // NumericLiteral
  // ↑↓
  { "1": "a" }, // StringLiteral
];

This also managable.

// Ignore safely? e.g. typecast...
(1); // ParenthesizedExpr
// ↓
1; // Noop

// How to handle...?
[
  ((opts?.x)?.y), // ParenthesizedExpr > ChainExpr > ParenthesizedExpr > ChainExpr
  opts?.x?.y, // ChainExpr
];

// How to handle...?
a = 1, (b = 2, c = 3, b), (c = 4, a); // SequenceExpr > ParenthesizdExpr > SequenceExpr
// ↓
(a = 1), (b = 2), (c = 3), b, (c = 4), a; // SequenceExpr

We need to decide how to handle these.

Fortunately, it didn't seem to occur very often even among the 3,000 packages, so it might be enough to either prepare an exclusion list or simply be aware of it.

Oct 09 '25 06:10 leaysgur

This is tricky. Could write a custom transform for testing that converts any AST to a "canonical" version, and put both the original AST and post-formatter AST through that transform before comparing the ASTs.

From all the examples you've given, it looks like that'd be quite a pain. But I can't really see any other way.

Oct 13 '25 13:10 overlookmotel

Side note:

// Ignore safely? e.g. typecast... (1); // ParenthesizedExpr

Do we handle this (stupid) case:

// Sloppy mode
function foo(bar, bar) {
  ("use strict");
}

Removing the parentheses from around "use strict" here would make the code a syntax error.

Oct 13 '25 13:10 overlookmotel

After team discussion:

Count AST nodes
Ignore EmptyExpr, ParenthesizedExpr, SequenceExpr, and ChainExpr
Integrate this into either oxc_formatter or oxfmt
Use a feature flag to control the build
Expose it through a hidden CLI flag

Oct 16 '25 09:10 leaysgur

To implement the counter, you could either:

Run AST through Semantic and then loop through AstNodes, incrementing elements of a Box<[usize; AST_TYPE_MAX + 1]> based on node.kind().ty() as usize. or
Codegen a Visit impl that avoids the cost of building Semantic, using oxc_ast_tools.

Approach (1) will be much simpler, so unless it's super-slow constructing Semantic for every AST, personally I'd go with that.

Note: boxed_array! macro in oxc_data_structures may be useful.

Oct 16 '25 10:10 overlookmotel

Thanks for the reference! 🙏🏻

I already have this implementation.

https://github.com/oxc-project/monitor-oxc/pull/116

But do you think it’s not enough?

Oct 16 '25 10:10 leaysgur

Ah sorry I forgot about enter_node. Yes that's a simpler way to do it.

Only possible change I can see is replacing FxHashMap<String, usize> with Box<[usize; AST_TYPE_MAX + 1]> and using kind.ty() as usize as the key.

It's a trade-off - kind.debug_name() gives more info than kind.ty() (which is purely the type of the node), but FxHashMap keyed by String is much slower than a boxed array keyed by usize.

(Vec<usize> would also work, but boxed slice is more performant as compiler will be able to see that boxed_array[kind.ty()] can never go out of bounds, as the length of the array is statically known to be 1 greater than max value of kind.ty() as usize)

But if it's already fast enough for our purposes, then your solution is probably superior to mine.

Oct 16 '25 10:10 overlookmotel

After working on this for a while, I realized that it is almost impossible to solve this by strictly counting AST nodes. I'll note the summary of challenges for the future reference.

Block comments

/*  These trailing ->     
  *  and also ->     
   *  <- this marker    */
// 🔽
/*  These trailing ->
 *  and also ->
 *  <- this marker    */

We can count the number of block comments, but it's not easy to compare contents.

Line comments

Trimming whitespace also happens for line comments.

// extra whitespaces ->     
// 🔽
// extra whitespaces ->

And sometimes, line comments are merged...

for (x
in //a
y); //b
// 🔽
for (x in y); //a //b

We cannot count the number of line comments.

Still, can be checked like:

for bf in before_line_comments {
    if after_line_comments.iter().any(|af| (bf.trim_end() == af) || (af.contains(bf))) {
        continue;
    }
    return Some(format!("LineComment mismatch: `{bf}` not found"));
}

EmptyStatement

They can be removed.

;;;[];
// 🔽
[];

TSUnionType, TSIntersectionType

They can have unnecessary leading | or &, and if only one type is combined will be removed.

type T = | A;
// 🔽
type T = A;

type T = & A;
// 🔽
type T = A;

Object-like keys

This is controlled by quote-props option.

const x = { 'a': 1 };
// 🔽
const x = { a: 1 };

For:

StringLiteral
NumberLiteral
IdentifierName

which parent is:

ObjectProperty
MethodDefinition
PropertyDefinition
ImportAttribute
TSPropertySignature
TSLiteralType

will be shown as count difference.

JSXText

<div>
  {children}
</div>
// 🔽
<div>{children}</div>

In this case, there are \n and \n, will be removed.

<div>{" "}{' '}</div>
// 🔽
<div> </div>

JSXExpressionContainer with a single whitespace StringLiteral will be removed. Instead, JSXText with will be added.

<p>World    </p>
// 🔽
<p>World </p>

<p>abc   :  def</p>
// 🔽
<p>abc : def</p>

Redundant whitespaces can be truncated even if they are inside JSXText!

Parentheses related nodes

Useless parentheses can be safely removed.

(a)
// 🔽
a

type T = (A | B)
// 🔽
type T = A | B

However, at the same time, some nodes are affected by parentheses removal.

(a, (b, c))
// 🔽 = SequenceExpression: 2 -> 1
(a, b, c)

(a?.b)?.().c
// 🔽 = ChainExpression: 2 -> 1, CallExpression(<computed>) -> CallExpression(b)
a?.b?.().c

In the current WIP implementation, I was strictly counting their number using debug_name(), but I'll relax that somewhat.

Oct 29 '25 03:10 leaysgur

[x] Implementation merge
- #15059
[x] Integration: monitor-oxc
- Just enable feature
- Add separate runner for clarity (there is already 95,000 lines of diff...)
- Implement test() trait function and call format() once
- https://github.com/oxc-project/monitor-oxc/pull/118
[x] Integration: oxlint-ecosystem-ci
- Need to refactor from linter only current implementation
- Build oxfmt with feature and use it
- @Dunqing will be working on this
- https://github.com/oxc-project/oxc-ecosystem-ci/pull/78

Oct 30 '25 06:10 leaysgur

https://github.com/oxc-project/monitor-oxc/actions/runs/19450657404/job/55654665015
https://github.com/oxc-project/oxc-ecosystem-ci/actions/runs/19451629193

Confirmed the latest run did not report the code removal.

Nov 18 '25 02:11 leaysgur