formatter: infra to detect no code removal
See TODOs ππ» https://github.com/oxc-project/oxc/issues/13720#issuecomment-3466314647
I saw a few posts mentioning Biome dropping code causing frustration.
An infra for detecting such cases is a must.
Prettier has a runtime check to detect no comments removal.
We should add this infra to:
- conformance suite
- monitor-oxc
- possibly CLI as well for debugging purposes
NOTE: Currently, it hasn't been ported yet, but the original biome_formatter code includes debug runtime mechanisms like:
state.assert_formatted_all_tokens()comments.assert_formatted_all_comments()
which appear to address this issue.
https://github.com/biomejs/biome/blob/3eeb4dfe0ea8144b6c40580bb065f75558f77285/crates/biome_formatter/src/lib.rs#L1485-L1491
NOTE: Currently, it hasn't been ported yet, but the original
biome_formattercode includes debug runtime mechanisms like:
state.assert_formatted_all_tokens()comments.assert_formatted_all_comments()which appear to address this issue.
https://github.com/biomejs/biome/blob/3eeb4dfe0ea8144b6c40580bb065f75558f77285/crates/biome_formatter/src/lib.rs#L1485-L1491
These ensure all AST nodes and comments are formatted.
I'm skeptical though, people reported missing code and idempotency issues regardless.
I'm skeptical though, people reported missing code and idempotency issues regardless.
Oh, I see. Let me start by investigating what kind of cases were occurring.
I did some deep research on Claude, but while I found a few resolved issues, I couldn't find any unresolved ones.
The currently open issues should be addressed, though.
https://github.com/biomejs/biome/issues?q=sort%3Aupdated-desc%20is%3Aissue%20is%3Aopen%20label%3AA-Formatter%20label%3AL-JavaScript
I'm wondering adding test like this in monitor-oxc:
fn test(source_text) {
let program1 = parse(source_text);
let count1 = count_all_ast_nodes_and_comments(program1);
let source_text2 = format(program1);
let program2 = parse(source_text2);
let count2 = count_all_ast_nodes_and_comments(program2);
assert_eq!(count1, count2);
}
NOTE:
- Use
ast_kind.debug_name() - Ignore
EmptyStatement,ParenthesizedExpression, etc? - For
{ "a": 1 }->{ a: 1 },StringLiteralwill beIdentifierName
As far as I tested locally in https://github.com/oxc-project/monitor-oxc/pull/116, it looks fine.
One concern is that nodes are being added and removed more frequently than I anticipated.
// Ignore safely
;[]; // EmptyStatement + ExprStatement
// β
[]; // ExprStatement
// Ignore if `t.trim().is_empty()` is safe (basically)
[
// JSXText(\n) + JSXExprContainer + JSXText(\n)
<P>
{children}
</P>,
// β
<P>{children}</P>, // JSXExprContainer
];
It's fine.
// Count `key.value` and check the diff
[
{ a: 1 }, // IdentifierName
// ββ
{ "a": 1 }, // StringLiteral
{ 1: "a" }, // NumericLiteral
// ββ
{ "1": "a" }, // StringLiteral
];
This also managable.
// Ignore safely? e.g. typecast...
(1); // ParenthesizedExpr
// β
1; // Noop
// How to handle...?
[
((opts?.x)?.y), // ParenthesizedExpr > ChainExpr > ParenthesizedExpr > ChainExpr
opts?.x?.y, // ChainExpr
];
// How to handle...?
a = 1, (b = 2, c = 3, b), (c = 4, a); // SequenceExpr > ParenthesizdExpr > SequenceExpr
// β
(a = 1), (b = 2), (c = 3), b, (c = 4), a; // SequenceExpr
We need to decide how to handle these.
Fortunately, it didn't seem to occur very often even among the 3,000 packages, so it might be enough to either prepare an exclusion list or simply be aware of it.
This is tricky. Could write a custom transform for testing that converts any AST to a "canonical" version, and put both the original AST and post-formatter AST through that transform before comparing the ASTs.
From all the examples you've given, it looks like that'd be quite a pain. But I can't really see any other way.
Side note:
// Ignore safely? e.g. typecast... (1); // ParenthesizedExpr
Do we handle this (stupid) case:
// Sloppy mode
function foo(bar, bar) {
("use strict");
}
Removing the parentheses from around "use strict" here would make the code a syntax error.
After team discussion:
- Count AST nodes
- Ignore
EmptyExpr,ParenthesizedExpr,SequenceExpr, andChainExpr - Integrate this into either
oxc_formatteroroxfmt - Use a feature flag to control the build
- Expose it through a hidden CLI flag
To implement the counter, you could either:
- Run AST through
Semanticand then loop throughAstNodes, incrementing elements of aBox<[usize; AST_TYPE_MAX + 1]>based onnode.kind().ty() as usize. or - Codegen a
Visitimpl that avoids the cost of buildingSemantic, usingoxc_ast_tools.
Approach (1) will be much simpler, so unless it's super-slow constructing Semantic for every AST, personally I'd go with that.
Note: boxed_array! macro in oxc_data_structures may be useful.
Thanks for the reference! ππ»
I already have this implementation.
https://github.com/oxc-project/monitor-oxc/pull/116
But do you think itβs not enough?
Ah sorry I forgot about enter_node. Yes that's a simpler way to do it.
Only possible change I can see is replacing FxHashMap<String, usize> with Box<[usize; AST_TYPE_MAX + 1]> and using kind.ty() as usize as the key.
It's a trade-off - kind.debug_name() gives more info than kind.ty() (which is purely the type of the node), but FxHashMap keyed by String is much slower than a boxed array keyed by usize.
(Vec<usize> would also work, but boxed slice is more performant as compiler will be able to see that boxed_array[kind.ty()] can never go out of bounds, as the length of the array is statically known to be 1 greater than max value of kind.ty() as usize)
But if it's already fast enough for our purposes, then your solution is probably superior to mine.
After working on this for a while, I realized that it is almost impossible to solve this by strictly counting AST nodes. I'll note the summary of challenges for the future reference.
Block comments
/* These trailing ->
* and also ->
* <- this marker */
// π½
/* These trailing ->
* and also ->
* <- this marker */
We can count the number of block comments, but it's not easy to compare contents.
Line comments
Trimming whitespace also happens for line comments.
// extra whitespaces ->
// π½
// extra whitespaces ->
And sometimes, line comments are merged...
for (x
in //a
y); //b
// π½
for (x in y); //a //b
We cannot count the number of line comments.
Still, can be checked like:
for bf in before_line_comments {
if after_line_comments.iter().any(|af| (bf.trim_end() == af) || (af.contains(bf))) {
continue;
}
return Some(format!("LineComment mismatch: `{bf}` not found"));
}
EmptyStatement
They can be removed.
;;;[];
// π½
[];
TSUnionType, TSIntersectionType
They can have unnecessary leading | or &, and if only one type is combined will be removed.
type T = | A;
// π½
type T = A;
type T = & A;
// π½
type T = A;
Object-like keys
This is controlled by quote-props option.
const x = { 'a': 1 };
// π½
const x = { a: 1 };
For:
- StringLiteral
- NumberLiteral
- IdentifierName
which parent is:
- ObjectProperty
- MethodDefinition
- PropertyDefinition
- ImportAttribute
- TSPropertySignature
- TSLiteralType
will be shown as count difference.
JSXText
<div>
{children}
</div>
// π½
<div>{children}</div>
In this case, there are \n and \n, will be removed.
<div>{" "}{' '}</div>
// π½
<div> </div>
JSXExpressionContainer with a single whitespace StringLiteral will be removed.
Instead, JSXText with will be added.
<p>World </p>
// π½
<p>World </p>
<p>abc : def</p>
// π½
<p>abc : def</p>
Redundant whitespaces can be truncated even if they are inside JSXText!
Parentheses related nodes
Useless parentheses can be safely removed.
(a)
// π½
a
type T = (A | B)
// π½
type T = A | B
However, at the same time, some nodes are affected by parentheses removal.
(a, (b, c))
// π½ = SequenceExpression: 2 -> 1
(a, b, c)
(a?.b)?.().c
// π½ = ChainExpression: 2 -> 1, CallExpression(<computed>) -> CallExpression(b)
a?.b?.().c
In the current WIP implementation, I was strictly counting their number using debug_name(), but I'll relax that somewhat.
- [x] Implementation merge
- #15059
- [x] Integration: monitor-oxc
- Just enable feature
- Add separate runner for clarity (there is already 95,000 lines of diff...)
- Implement
test()trait function and callformat()once - https://github.com/oxc-project/monitor-oxc/pull/118
- [x] Integration: oxlint-ecosystem-ci
- Need to refactor from linter only current implementation
- Build oxfmt with feature and use it
- @Dunqing will be working on this
- https://github.com/oxc-project/oxc-ecosystem-ci/pull/78
- https://github.com/oxc-project/monitor-oxc/actions/runs/19450657404/job/55654665015
- https://github.com/oxc-project/oxc-ecosystem-ci/actions/runs/19451629193
Confirmed the latest run did not report the code removal.