oxc icon indicating copy to clipboard operation
oxc copied to clipboard

Align JS-side AST with standard for TypeScript

Open overlookmotel opened this issue 9 months ago • 30 comments

Our AST on JS side (via oxc-parser NPM package) is now aligned with ESTree standard for JS syntax. We have conformance tests which ensure our AST is exactly the same as Acorn's AST for all Test262 test cases which Acorn can parse.

However, we do not yet guarantee that our AST for TypeScript types is aligned with any particular standard.

TS-ESLint's parser seems like the obvious target.

Achieving alignment

Recently there have been various PRs to align parts of the AST with TS-ESLint.

In my opinion, it'd be preferable if we took a more systematic approach, using a conformance suite. This will ensure:

  • We don't miss anything, and can confidently assert our complete alignment with TS-ESLint.
  • We don't break anything in future.
  • While making changes to align one type, we don't inadvertently break another.

This methodology proved successful while working on ESTree comatibility. Once we had the conformance suite working, we got to 99% passing within a few days!

I suggest we adopt same approach as I've outlined in #9703, with the target snapshots generated in and committed to acorn-test262 repo. We can:

  • Use all the test cases in https://github.com/microsoft/TypeScript/tree/main/tests/cases/conformance and https://github.com/microsoft/TypeScript/tree/main/tests/cases/compiler.
  • Parse all these cases with TS-ESLint parser, convert the resulting ASTs to JSON, and commit those JSON files to the repo.

How

Wherever possible, it's preferable to use the DSL (domain specific language) provided by #[estree] attributes on AST types to control how the AST gets converted e.g. #[estree(flatten)], #[estree(rename = "...")], #[estree(append_to = "...")].

We can also use custom converters #[estree(via = ...)] where necessary, but the less we have to resort to that, the better.

Conflicts between ESTree and TS-ESLint

It's possible that we'll run into cases where Acorn and TS-ESLint produce a different AST for some JS syntax. If so, that'll be problematic. Ideally we want a single AST shape which is compatible with both ESTree and TS-ESLint.

If we do find such cases, I suggest that we just skip them initially. Once we have everything else passing, we can see how broad a problem this is, and figure out how best to deal with it.

Raw transfer

I suggest that in the inital alignment effort, we don't worry about raw transfer.

Let's only adapt the AST that Oxc generates as JSON via ESTree trait. This will be simpler to do, and will get us to the finish line faster.

Only once we have all the tests passing, we can then work on getting the AST generated by raw transfer aligned too. This is generally much easier once you have a working ESTree impl to translate from.

JSX

See #9703. I suggest we do JSX first before TS.

overlookmotel avatar Mar 12 '25 07:03 overlookmotel

cc @ArnaudBarre @hi-ogawa

overlookmotel avatar Mar 12 '25 07:03 overlookmotel

I think you should target TS-ESLint for anything non supported by acorn, this will simplify the setup. JSX is small enough so that it will probably not conflict, but today most JSX code is TSX so IMO I think you can avoid the intermediary step. Also TS-ESLint adopt JS features faster than acorn (the these two small diff I found, on top of acorn not yet handling decorators or import assertions) so IMO you should only target one popular parser (Babel would be a valid choice also)

ArnaudBarre avatar Mar 12 '25 08:03 ArnaudBarre

#9774 implements a TS conformance suite.

Continuing the conversation from that PR about current state of our conformance...

@hi-ogawa said:

Current result Positive Passed: 32/10725 (0.30%) 😢

Oh dear! Well, we had a similar result when we started on ESTree conformance, and within a few days we were at 99%. Maybe it'll be the same this time, inshallah.

But...

maybe keys are ordered alphabetically

In Literal nodes for numbers, TS-ESLint has the raw field before value. That also matches the pattern of alphabetical order that you noticed.

The order of keys doesn't really matter. What does matter is that it's consistent, to make operating on the objects in JS code efficient, as consistent object shapes allow JS engines to monomorphize functions that operate on those objects.

The other advantage of testing for correct field order is that conformance tests can compare the JSON produced by our serializer directly to the snapshotted JSON, with a simple string comparison. This is much faster than using serde to parse both JSONs to serde::Values, and comparing those structures. Before I made the switch to direct JSON string comparison, the conformance tests were impractically slow to run on CI.

I can see 2 possible ways forward:

  1. Alter conformance tester to not be sensitive to field order, at least for now. We could disable testing TS conformance on CI, while we work on getting the AST aligned, and then deal with field order later.

  2. Alter the AST in acorn-test262 before serializing to JSON. We'd re-order fields to be same order as ESTree (Acorn), with additional TS fields after that. To avoid having to add loads of #[estree(field_order(...))] attributes to AST types, we could alter the codegen in oxc_ast_tools to automatically put #[ts] fields last.

Perhaps (1) is a good place to start, and then we do (2) later on?

typescript-eslint parser uses undefined instead of null for optional ts-specific field

That's problematic. We could match that behavior in raw transfer without any perf hit, but obviously we can't when transferring the AST via JSON. We'd have to add a "fixup" pass on JS side to convert null to undefined. I imagine that'd be costly.

I think we need to keep the JSON-based transfer method alive for the foreseeable future. Raw transfer is experimental, and quite possible it has show-stopper bugs. We've already found it doesn't work on Node 20 or Bun, and we've not got it working on WASM yet.

I would also be loathe to remove empty TS-specific fields entirely in JS-side AST. That'd be a big perf hit for the consumer, as it'd result in variable object shapes.

I'd suggest the best starting point on this one is to modify the AST in acorn-test262 to convert all undefineds to nulls, prior to converting to JSON.

But does null vs undefined matter anyway? Will it affect downstream tools if we just go with null?

overlookmotel avatar Mar 14 '25 09:03 overlookmotel

@hi-ogawa Do you have write access to acorn-estree repo? If so, feel free to make changes there if required and merge your own PRs. I'm going to be away from keyboard until Monday, and don't want to block your progress if you're working during that time.

overlookmotel avatar Mar 14 '25 09:03 overlookmotel

@overlookmotel Yes, I saw a green merge button myself, so I should be able to push it. 👍

hi-ogawa avatar Mar 14 '25 09:03 hi-ogawa

You will have pre-processing to do anyway to convert TS-ESLint location to start/end. You could also do the null/undefined change during that process. Personally for now I've done the diff in pure JS, to also have false, null, undefined and [] be considered equivalent, handle bigint non json serializable and don't throw on nullable keys from one or the other.

ArnaudBarre avatar Mar 14 '25 09:03 ArnaudBarre

Just to clarify one point:

Alter conformance tester to not be sensitive to field order, at least for now.

I mean alter conformance tester for TypeScript only. We have succeeded in getting field order aligned with Acorn (and I made some PRs on Acorn https://github.com/acornjs/acorn/pull/1347 https://github.com/acornjs/acorn/pull/1346 to make Acorn align with us!), so would be ideal not to accidentally regress that.

overlookmotel avatar Mar 14 '25 09:03 overlookmotel

AST of ts-eslint should be 100% aligned with estree (extended), but there is sometimes differences when new js features are added (stage 3) and there is small period of time where AST may differ ts-eslint also does alignment check with @babel/eslint to ensure that consistency

  • https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/ast-spec/tests/util/serializers/Node.ts
  • https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/ast-spec/src/declaration/ClassDeclaration/fixtures/extends/snapshots/1-TSESTree-AST.shot
  • https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/ast-spec/src/declaration/ClassDeclaration/fixtures/extends/snapshots/3-Babel-AST.shot

But does null vs undefined matter anyway? Will it affect downstream tools if we just go with null?

yes to my knowledge there is few cases where null has actual meaning in estree but there is technically no difference in null | undefined as far as estree is concerned outside of semantics and depending what parser you use acron/babel/ts-eslint/esprima "nullish" value may be different

https://github.com/estree/estree/blob/master/es2015.md#expressions

eg.

[,2]
/// elements: [null, Literal]

https://typescript-eslint.io/play/#ts=5.7.3&showAST=es&fileType=.tsx&code=MYewdgzgLgBAHjAvDA2gGgEwF0g&eslintrc=N4KABGBEBOCuA2BTAzpAXGYBfEWg&tsconfig=N4KABGBEDGD2C2AHAlgGwKYCcDyiAuysAdgM6QBcYoEEkJemy0eAcgK6qoDCAFutAGsylBm3TgwAXxCSgA&tokens=false


additionally ts-eslint adds empty array's and defaults "flags" to false and eg. babel does not always do it for ts specific fields

armano2 avatar Mar 16 '25 18:03 armano2

To avoid having to add loads of #[estree(field_order(...))] attributes to AST types, we could alter the codegen in oxc_ast_tools to automatically put #[ts] fields last.

I'm going to do this part now.

overlookmotel avatar Mar 17 '25 02:03 overlookmotel

2. Alter the AST in acorn-test262 before serializing to JSON. We'd re-order fields to be same order as ESTree (Acorn), with additional TS fields after that.

I'm thinking about this, but to reorder this, we need to know acorn's order in js script of acorn-test262. Is there any easy-to-use format of estree spec providing this data? Either generating our own from ast tools or somehow extracting it from markdown https://github.com/estree/estree or dts https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/estree/index.d.ts?

For now I went with 2st option of normalizing oxc output https://github.com/oxc-project/oxc/pull/9813.

hi-ogawa avatar Mar 17 '25 03:03 hi-ogawa

To avoid having to add loads of #[estree(field_order(...))] attributes to AST types, we could alter the codegen in oxc_ast_tools to automatically put #[ts] fields last.

This is now done. It actually affected very few types. #9820

For now I went with 2st option of normalizing oxc output #9813.

OK great. Yes, probably easiest to get content aligned first before we worry about field order.

Maybe the slow speed I noticed previously was from generating diffs for mismatches, rather than the serde deserialization.

I'm thinking about this, but to reorder this, we need to know acorn's order in js script of acorn-test262. Is there any easy-to-use format of estree spec providing this data? Either generating our own from ast tools or somehow extracting it from markdown https://github.com/estree/estree or dts https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/estree/index.d.ts?

Yes. I've written something to output a list of types with their field order, based on Oxc's AST schema: overlookmotel/estree-field-orders branch + https://github.com/oxc-project/acorn-test262/pull/11

We can do that whenever we're ready.

Right now it's not perfect, because there are some types which exist in our AST which don't exist in TS-ESLint's (and probably vice-versa).

overlookmotel avatar Mar 17 '25 10:03 overlookmotel

Thanks for the feedback.

AST of ts-eslint should be 100% aligned with estree (extended), but there is sometimes differences when new js features are added (stage 3) and there is small period of time where AST may differ ts-eslint also does alignment check with @babel/eslint to ensure that consistency

This is great to hear. Hopefully we won't have trouble with the ESTree-compatible and TSESLint-compatible ASTs diverging from each other (apart from, obviously, the TS AST having more fields e.g. typeAnnotation, and additional types for those fields).

yes to my knowledge there is few cases where null has actual meaning in estree but there is technically no difference in null | undefined as far as estree is concerned outside of semantics and depending what parser you use acron/babel/ts-eslint/esprima "nullish" value may be different

https://github.com/estree/estree/blob/master/es2015.md#expressions

eg.

[,2] /// elements: [null, Literal]

We do handle that one correctly.

I think our best starting point is to treat null and undefined as equivalent ("undefined" meaning a field that exists and has the value undefined, as opposed to a field which isn't present at all).

additionally ts-eslint adds empty array's and defaults "flags" to false and eg. babel does not always do it for ts specific fields

Interesting. Babel is a whole other thing, though. Babel's AST is quite different from ESTree, and we're not seeking to replicate its AST (though we might offer that as an option later on, if there's a demand for it).

overlookmotel avatar Mar 17 '25 10:03 overlookmotel

If anyone is interested in helping out this, here is my workflow to understand estree mismatches:

# this runs full estree conformance and updates estree_xxx.snap
# but doesn't generate diff
cargo run -p oxc_coverage -- estree

# SAVE_DIFF=true to save diff files in tasks/coverage/acorn-test262-diff
# saving many files is too slow, so using --filter is recommended
SAVE_DIFF=true cargo run -p oxc_coverage -- estree --filter compiler/2dArray

# also you can use `just example parser` for any file to dump ast output
just example parser --estree test.ts

And this is an example diff, which shows Identifier and thisParam is off currently.

tasks/coverage/acorn-test262-diff/typescript/tests/cases/compiler/2dArrays.diff
@@ -20,10 +20,7 @@
         "type": "Identifier",
         "start": 6,
         "end": 10,
-        "decorators": [],
-        "name": "Cell",
-        "optional": false,
-        "typeAnnotation": null
+        "name": "Cell"
       },
       "implements": [],
       "superClass": null,
@@ -53,10 +50,7 @@
               "type": "Identifier",
               "start": 33,
               "end": 39,
-              "decorators": [],
-              "name": "isSunk",
-              "optional": false,
-              "typeAnnotation": null
+              "name": "isSunk"
             },
             "optional": false,
             "override": false,
@@ -82,10 +76,7 @@
         "type": "Identifier",
         "start": 22,
         "end": 26,
-        "decorators": [],
-        "name": "Ship",
-        "optional": false,
-        "typeAnnotation": null
+        "name": "Ship"
       },
       "implements": [],
       "superClass": null,
@@ -115,10 +106,7 @@
               "type": "Identifier",
               "start": 71,
               "end": 76,
-              "decorators": [],
-              "name": "ships",
-              "optional": false,
-              "typeAnnotation": null
+              "name": "ships"
             },
             "optional": false,
             "override": false,
@@ -141,10 +129,7 @@
                     "type": "Identifier",
                     "start": 78,
                     "end": 82,
-                    "decorators": [],
-                    "name": "Ship",
-                    "optional": false,
-                    "typeAnnotation": null
+                    "name": "Ship"
                   }
                 }
               }
@@ -164,10 +149,7 @@
               "type": "Identifier",
               "start": 90,
               "end": 95,
-              "decorators": [],
-              "name": "cells",
-              "optional": false,
-              "typeAnnotation": null
+              "name": "cells"
             },
             "optional": false,
             "override": false,
@@ -190,10 +172,7 @@
                     "type": "Identifier",
                     "start": 97,
                     "end": 101,
-                    "decorators": [],
-                    "name": "Cell",
-                    "optional": false,
-                    "typeAnnotation": null
+                    "name": "Cell"
                   }
                 }
               }
@@ -211,10 +190,7 @@
               "type": "Identifier",
               "start": 118,
               "end": 130,
-              "decorators": [],
-              "name": "allShipsSunk",
-              "optional": false,
-              "typeAnnotation": null
+              "name": "allShipsSunk"
             },
             "kind": "method",
             "optional": false,
@@ -262,20 +238,14 @@
                                     "type": "Identifier",
                                     "start": 191,
                                     "end": 194,
-                                    "decorators": [],
-                                    "name": "val",
-                                    "optional": false,
-                                    "typeAnnotation": null
+                                    "name": "val"
                                   },
                                   "optional": false,
                                   "property": {
                                     "type": "Identifier",
                                     "start": 195,
                                     "end": 201,
-                                    "decorators": [],
-                                    "name": "isSunk",
-                                    "optional": false,
-                                    "typeAnnotation": null
+                                    "name": "isSunk"
                                   }
                                 }
                               }
@@ -290,13 +260,17 @@
                               "type": "Identifier",
                               "start": 177,
                               "end": 180,
+                              "accessibility": null,
                               "decorators": [],
                               "name": "val",
                               "optional": false,
+                              "override": false,
+                              "readonly": false,
                               "typeAnnotation": null
                             }
                           ],
                           "returnType": null,
+                          "thisParam": null,
                           "typeParameters": null
                         }
                       ],
@@ -320,10 +294,7 @@
                             "type": "Identifier",
                             "start": 155,
                             "end": 160,
-                            "decorators": [],
-                            "name": "ships",
-                            "optional": false,
-                            "typeAnnotation": null
+                            "name": "ships"
                           }
                         },
                         "optional": false,
@@ -331,10 +302,7 @@
                           "type": "Identifier",
                           "start": 161,
                           "end": 166,
-                          "decorators": [],
-                          "name": "every",
-                          "optional": false,
-                          "typeAnnotation": null
+                          "name": "every"
                         }
                       },
                       "optional": false,
@@ -349,6 +317,7 @@
               "id": null,
               "params": [],
               "returnType": null,
+              "thisParam": null,
               "typeParameters": null
             }
           }
@@ -360,10 +329,7 @@
         "type": "Identifier",
         "start": 59,
         "end": 64,
-        "decorators": [],
-        "name": "Board",
-        "optional": false,
-        "typeAnnotation": null
+        "name": "Board"
       },
       "implements": [],
       "superClass": null,

Original ts source file is found in tasks/coverage/typescript/tests/cases/compiler/2dArrays.ts and a reference estree output is found in tasks/coverage/acorn-test262/test-typescript/tests/cases/compiler/2dArrays.ts.md.

hi-ogawa avatar Mar 18 '25 03:03 hi-ogawa

FYI: To find out what need to work on, I created a simple app that works locally.

https://github.com/leaysgur/oxc_estree_ts-ast-diff-viewer

If anyone is interested, please give it a try. 😉

leaysgur avatar Apr 08 '25 03:04 leaysgur

This is absolutely brilliant! Going to make this work so much easier.

overlookmotel avatar Apr 08 '25 11:04 overlookmotel

NOTE: Regarding the mismatch related to JSDoc(Non)NullableType

This is a niche case, so we may not need to worry too much about it, but I'll leave this note for future reference...

Root fixes have been implemented in #10358 and #10363 . However, the mismatches remain for the following reasons:

  • The TS-ESLint side has unnecessary fields
    • These are TS-derived fields that likely slipped through their filtering process
    • We could add_fields on our side, but since they're unnecessary fields, it's fine to ignore them
  • The typeAnnotation has some unclear nesting
    • The span position jumps over the upper layer(!)
    • This is probably their bug? So we don't need to address this either
"typeAnnotation": {
  "end": 54,
  "start": 47,
  "type": "TSTypeAnnotation",
  "typeAnnotation": {
    "emitNode": null,
    "end": 54,
    "id": 0,
    "original": null,
    "postfix": true,
    "start": 47, // 👈🏻 
    "type": "TSJSDocNonNullableType",
    "typeAnnotation": {
      "end": 53,
      "start": 45, // 👈🏻 
      "type": "TSTypeAnnotation",
      "typeAnnotation": {
        "end": 53,
        "start": 47,
        "type": "TSStringKeyword",
      },
    },
  },
},

The relevant test cases can be found by searching for nullabletype.

leaysgur avatar Apr 14 '25 01:04 leaysgur

  • This is probably their bug?

Feel free to report to ts-eslint if you are able to confirm.

Boshen avatar Apr 14 '25 03:04 Boshen

By everyone's efforts, coverage has finally crossed the 80%! The gap is surely closing compared to the beginning. 🚀

So, I've just walked through the remaining about 500(or so) cases and compiled a list of what mainly remains to be done.(Not all, I'm proud to say I've missed a few. 🙈)

👉🏻 Updated TODOs are here: https://github.com/oxc-project/oxc/issues/9705#issuecomment-2829457031

outdated
# Legend

- `Minimum repro code`
  - Summary of the difference
  - Related test fixture category:name example
  - Addressed issue or PR
  - [opts] Boshen's comment
  • [x] class X { constructor(public x, y) {} }
    • If accessibility exists, it should be TSParameterProperty, not just an Identifier
    • compiler:constructorArgs
    • #10070
    • Boshen: someone change the serializer
  • [x] module A.B.C {}
    • Whole TSModuleDeclaration nesting or just TSQualifiedName nesting
    • compiler:exportImportCanSubstituteConstEnumForValue
    • Boshen: someone change the serializer (Cannot change the AST)
    • #10574
  • [x] class X implements A.B.C {}
    • TSClassImplements.expression should be MemberExpressions, not TSQualifiedNames
    • compiler:complicatedPrivacy
    • Boshen: ts-eslint is probably incorrect here
    • #10607
  • [x] type foo = typeof this.foo
    • If TypeQuery has Identifier named this, it should be ThisExpression
    • compiler:circularGetAccessor
    • Boshen: why can this be a normal js ThisExpression?
    • #10603
  • [ ] function x(): v is (string) {}
    • The span of a non-existent TSParenthesizedType is used for its child span
    • conformance:typeGuardNarrowsToLiteralTypeUnion
    • 🍃
    • Boshen: ts-eslint bug?
  • [ ] a = `<CRLF>\<CRLF>`;
    • <CRLF> should be preserved as-is?
    • conformance:es6/templates/templateStringMultiline1_ES6
    • overlookmotel says: I think this might be an artefact of our conformance testing framework.
  • [x] declare module "jquery"
    • Do not produce body: null for empty module
    • compiler:importExportInternalComments
    • #10574
  • [x] interface I { n: 1; ex(this: this, m: 2): 3; }
    • Prepend this_param as TSThisType for types is missing(other than Function)
    • conformance:looseThisTypeInFunctions
    • #10068
    • Need to change the serializer
  • [x] @deco export class X {}
    • Span start should include decorator or NOT
    • compiler:decoratorMetadataRestParameterWithImportedType
    • #10409
  • [x] a<b>?.();
    • TSInstantiationExpression is not generated by parser?
    • compiler:optionalChainWithInstantiationExpression2
    • #10461
  • [x] class C { 'constructor'() {} }
    • Quoted string constructor key should be treated as Identifier not a Literal
    • conformance:quotedConstructors
    • #10471
    • Boshen: how is this an identifier?
  • [x] const { ...a: b } = {}
    • Should report "A rest element cannot have a property name" error
    • compiler:objectBindingPattern_restElementWithPropertyName
    • #10466
  • [x] function b(...[...foo = []]: string[]) { }
    • Should report "A rest element cannot have an initializer" error
    • compiler:restParameterWithBindingPattern3
    • #10467

IMO, The higher in the list has more impact on coverage numbers.

(P.S. I've tried to fix all of them once, but gave up... 😇)

leaysgur avatar Apr 16 '25 08:04 leaysgur

Thanks for compiling this list (and all the tireless work to make it as short as it is!)

I've converted your list to tickboxes, so we can tick them off.

@deco export class X {} is now fixed.

I can try to tackle type foo = typeof this.foo and class C { 'constructor'() {} } next. Unfortunately, my almost total lack of understanding of TypeScript makes it harder for me to help with the more typescripty ones (what does : v is ("A" | "B") even mean???)

overlookmotel avatar Apr 16 '25 19:04 overlookmotel

what does : v is ("A" | "B") even mean???

You may already know this, but all I can do now is tell you about TS grammar...! 😇

is is advanced type called TypePredicate for type narrowing.

https://www.typescriptlang.org/docs/handbook/2/narrowing.html#using-type-predicates

// Narrowing type from `any` to `string`
function isStr(v: any): v is string {
  return typeof v === "string";
}

let mayStr = process.argv[2];
// `mayStr` is `any` type here...

if (isStr(mayStr)) {
  // `mayStr` is `string` type!
}

// `mayStr` is `any` type here...

"A" | "B" is just an UnionType with string LiteralTypes for letter A and B.

Finally, () is ParenthesizedType, ParenthesizedExpression for types. Our parser set preserveParens: false option, so they are not produced in AST.

In their AST, ParenthesizedType is not produced too. But the UnionType span seemed to be identical to the non-existent ParenthesizedType one.

👉🏻 I updated the minimum repro code shorter.

leaysgur avatar Apr 17 '25 04:04 leaysgur

By the way, it's a public holiday in UK for next 4 days, and I'll be away from keyboard from tomorrow. @leaysgur Please feel free to merge your own PRs (or anyone else's). If anything I want to tweak, I can always do that when I return.

overlookmotel avatar Apr 17 '25 14:04 overlookmotel

I plan to work on module A.B.C {} this afternoon. If anyone is already working on it (@leaysgur?) please say so!

overlookmotel avatar Apr 23 '25 12:04 overlookmotel

Haha, absolutely no problem. 🛌🏻 Thank you for working on the hardest one!

leaysgur avatar Apr 23 '25 13:04 leaysgur

It is done! declare module "jquery" also fixed. It was indeed quite tricky...

The thing that's confusing me now is that conformance snapshot shows 1300 tests still failing:

https://github.com/oxc-project/oxc/blob/256c67e295d86ae918eec69e9fcbb1d7f8a2cb5e/tasks/coverage/snapshots/estree_typescript.snap#L5

But oxc_estree_ts-ast-diff-viewer shows only ~50 fails remaining. Where's that difference coming from? I fear there's a monstrous iceberg lurking below the surface which is going to surprise us!

overlookmotel avatar Apr 23 '25 19:04 overlookmotel

Where's that difference coming from?

I was also curious about this, so I did a bit of research.

First, as a basic premise, my viewer simplifies the processing quite a bit.

For example, it does NOT do the following:

  • Splitting a single test file into multiple units and parsing each one individually
    • https://github.com/oxc-project/acorn-test262/blob/bea568dfa98e73a46aa524141d0ff32f8bec7322/typescript-eslint.js#L29
  • Strictly specifying parser options and sourceType
    • https://github.com/oxc-project/acorn-test262/blob/bea568dfa98e73a46aa524141d0ff32f8bec7322/typescript-eslint.js#L37-L82

Additionally:

  • The fact that I am using the JS oxc-parser instead of the Rust parser
  • For matching, such as key sorting not being completely identical

These might also be affecting the results.

9412/10725

I manually counted and classified the current snapshot lines:

  • 220: Expect to parse
  • 296: Unexpected estree file content
  • 785: Mismatch
  • 12: serde_json::from_str(oxc_json)

220: Expect to parse

It looks like cases where "they can't parse" are already being skipped as "we can't parse" either.

  • https://github.com/oxc-project/acorn-test262/blob/bea568dfa98e73a46aa524141d0ff32f8bec7322/typescript-eslint.js#L92-L94
  • https://github.com/oxc-project/oxc/blob/d1f5abb6552cdd7182e2856c17f1927f1e2a9c44/tasks/coverage/src/tools/estree.rs#L310

However, for cases where "they can parse" but "we can't parse",

  • My app skips these cases
  • But tasks/coverage is treated as a failure under "Expect to parse" error

I think it would be fine to modify the task to skip this..., but what do you think?

296: Unexpected estree file content

We probably need to investigate this further.

https://github.com/oxc-project/oxc/blob/d1f5abb6552cdd7182e2856c17f1927f1e2a9c44/tasks/coverage/src/tools/estree.rs#L327-L329

This process is necessary...?

785: Mismatch

Now, this command seemed to take only a few dozen seconds, so I will look into details of them.

SAVE_DIFF=true cargo run -p oxc_coverage -- estree

12: serde_json::from_str(oxc_json) Error

https://github.com/oxc-project/oxc/blob/d1f5abb6552cdd7182e2856c17f1927f1e2a9c44/tasks/coverage/src/tools/estree.rs#L362-L369

leaysgur avatar Apr 24 '25 03:04 leaysgur

I've also looked into it a bit. Looks like where a fixture file contains multiple sub-files, acorn-test262 and the Rust conformance runner in oxc repo chop it up into separate cases a bit differently. One includes leading // @alwaysStrict: true etc comments as part of the source to be parsed, and the other trims them off - which results in different spans.

I'll try and fix that. It may explain (hopefully) a bunch of tests which are flagged as "mismatch".

overlookmotel avatar Apr 24 '25 07:04 overlookmotel

It took ages to find, but turned out to be a simple thing. Many of the TS fixture files contain a UTF-8 BOM at the start. Oxc's conformance tester removes the BOM, but acorn-test262 didn't, which caused span mismatches. Fixed in #10595.

Also fixed all the Unexpected estree file content errors in #10601.

We're now at 97%!

overlookmotel avatar Apr 24 '25 14:04 overlookmotel

Just to explain the rationale for #10608:

As I said in that PR, I think it's useful to have a clearer view of the remaining mismatches, as those are the ones which should be our first priority to fix.

But also, there's further work to be done once we have all the remaining mismatches fixed:

  • Get the field order correct.
  • Add range and loc fields. #10307
  • Make sure raw transfer is working for all TS types.

I'm keen to build a JS-side AST visitor, but that's hard to do while the AST is a moving target - so I'm currently a bit blocked. So my preference is if we can leave the tricky problem of figuring out if we should be able to parse anything that we can't currently until a bit later.

But if anyone disagrees, please feel free to say so!

overlookmotel avatar Apr 24 '25 20:04 overlookmotel

I agree with that direction.

Regarding TS ASTs, it feels difficult to determine the criteria for what is a parse error, how much incomplete AST is acceptable, etc seems vague...

AST Parsed : 6489/6493 (99.94%) Positive Passed: 6458/6493 (99.46%)

Let's focus on these 35 cases! 🏁

(First, I'll recreate the TODO list later.)

leaysgur avatar Apr 25 '25 01:04 leaysgur

Parsing Errors (6/35)

  • [ ] Expect to Parse: tasks/coverage/typescript/tests/cases/compiler/arrayFromAsync.ts
    • await is only allowed within async functions and at the top levels of modules
    • SourceType.module_kind should be Module? But it is Unambigious
  • [ ] Expect to Parse: tasks/coverage/typescript/tests/cases/conformance/classes/propertyMemberDeclarations/staticPropertyNameConflicts.ts
    • Classes may not have a static property named prototype
    • Our parser report error for class X { static prototype: number; }, but theirs are NOT
  • [ ] Expect to Parse: tasks/coverage/typescript/tests/cases/conformance/es2019/importMeta/importMeta.ts
    • The only valid meta property for import is import.meta
    • Our parser throws error for import.metal, but theirs are NOT
  • [ ] Expect to Parse: tasks/coverage/typescript/tests/cases/compiler/sourceMapValidationDecorators.ts
    • Unexpected token
    • Our parser should support @decorator for BindingRestElement?
  • [ ] Expect to Parse: tasks/coverage/typescript/tests/cases/conformance/esDecorators/esDecorators-decoratorExpression.1.ts
    • Expected a semicolon or an implicit semicolon after a statement, but found none
    • #10615
  • [ ] Expect to Parse: tasks/coverage/typescript/tests/cases/conformance/jsx/jsxReactTestSuite.tsx
    • JSX expressions may not use the comma operator
    • #10578

Mismatch Errors (17/35)

  • [x] Shebang should not be hashbang property of Program, it should be the first comment
    • Mismatch: tasks/coverage/typescript/tests/cases/compiler/emitBundleWithShebang1.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/compiler/emitBundleWithShebang2.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/compiler/emitBundleWithShebangAndPrologueDirectives1.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/compiler/emitBundleWithShebangAndPrologueDirectives2.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/compiler/shebang.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/compiler/shebangBeforeReferences.ts
    • #10669
  • [x] <CRLF> is not preserved
    • Mismatch: tasks/coverage/typescript/tests/cases/conformance/es6/templates/templateStringMultiline3.ts
    • overlookmotel says: We can't fix this one. Skipping the test.
    • #10677
  • [ ] Span position for : value is (string)
    • Mismatch: tasks/coverage/typescript/tests/cases/conformance/expressions/typeGuards/typeGuardNarrowsPrimitiveIntersection.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/conformance/expressions/typeGuards/typeGuardNarrowsToLiteralTypeUnion.ts
  • [ ] #9667
    • Mismatch: tasks/coverage/typescript/tests/cases/conformance/jsx/tsxReactEmitEntities.tsx
    • Mismatch: tasks/coverage/typescript/tests/cases/conformance/jsx/tsxReactEmitNesting.tsx
  • [x] With type T = import("m", { assert: {} }), they parse name: assert as name: with(their bug?)
    • Mismatch: tasks/coverage/typescript/tests/cases/conformance/moduleResolution/resolutionModeImportType1.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/conformance/node/nodeModulesImportTypeModeDeclarationEmit1.ts
    • https://github.com/typescript-eslint/typescript-eslint/issues/11114
    • #10681
  • [ ] Key order mismatch (e.g. operator)
    • Mismatch: tasks/coverage/typescript/tests/cases/compiler/controlFlowInstanceofWithSymbolHasInstance.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/compiler/narrowingUnionToUnion.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/conformance/expressions/binaryOperators/instanceofOperator/instanceofOperatorWithRHSHasSymbolHasInstance.ts
    • Mismatch: tasks/coverage/typescript/tests/cases/conformance/types/stringLiteral/stringLiteralTypesAsTags02.ts

(For key ordering) JSON Parsing Errors (12/35)

  • [ ] JSON contains node like { "type": "Literal", "value": 1e+400, "raw": "1e999" }, 1e+400, Infinity, etc should be converted to null
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/compiler/deferredConditionalTypes2.ts
      • number out of range at line 28 column 25
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/compiler/fakeInfinity2.ts
      • number out of range at line 46 column 31
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/compiler/fakeInfinity3.ts
      • number out of range at line 46 column 31
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/conformance/es6/binaryAndOctalIntegerLiteral/binaryIntegerLiteral.ts
      • number out of range at line 129 column 27
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/conformance/es6/binaryAndOctalIntegerLiteral/binaryIntegerLiteralES6.ts
      • number out of range at line 129 column 27
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/conformance/es6/binaryAndOctalIntegerLiteral/octalIntegerLiteral.ts
      • number out of range at line 96 column 27
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/conformance/es6/binaryAndOctalIntegerLiteral/octalIntegerLiteralES6.ts
      • number out of range at line 96 column 27
  • [ ] JSON contains node like { "type": "Literal", "value": "\udc00", "raw": "\"\\u{DC00}\"" }
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/conformance/es6/unicodeExtendedEscapes/unicodeExtendedEscapesInStrings11.ts
      • lone leading surrogate in hex escape at line 30 column 28
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/conformance/es6/unicodeExtendedEscapes/unicodeExtendedEscapesInTemplates11.ts
      • lone leading surrogate in hex escape at line 38 column 35
  • [ ] JSON contains node like { "type": "TemplateElement", "value": { "raw": "\\u{D800}", "cooked": "\ud800" } }
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/conformance/es6/unicodeExtendedEscapes/unicodeExtendedEscapesInTemplates10.ts
      • unexpected end of hex escape at line 38 column 36
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/conformance/es6/unicodeExtendedEscapes/unicodeExtendedEscapesInStrings10.ts
      • unexpected end of hex escape at line 30 column 29
  • [ ] Recursion limit exceeded, may be fixed by disable_recursion_limit ?
    • serde_json::from_str(oxc_json) Error: tasks/coverage/typescript/tests/cases/compiler/parsingDeepParenthensizedExpression.ts
      • recursion limit exceeded at line 3334 column 269

leaysgur avatar Apr 25 '25 06:04 leaysgur