oxc icon indicating copy to clipboard operation
oxc copied to clipboard

Implement `serde::Serialize` on AST types via `#[generate_derive]`

Open overlookmotel opened this issue 4 months ago • 25 comments

We currently use serde's derive macros to implement Serialize on AST types.

We could use #[generate_derive] to generate these impls instead.

Why is that a good thing?

1. Reduce compile time

serde's macro is pretty expensive at compile time for the NAPI build. We can remove it.

2. Reduce boilerplate

serde's derive macro is less powerful than ast_tools. Because Serialize is a macro, all it knows about is the type that #[derive(Serialize)] is on. Whereas ast_tools builds a schema of the entire AST, so it knows not just about the type it's deriving impl for, but also all the other types too, and how they link to each other.

Currently we have to put #[serde] attributes everywhere:

#[ast]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[serde(tag = "type")]
pub struct ClassBody<'a> {
    #[serde(flatten)]
    pub span: Span,
    pub body: Vec<'a, ClassElement<'a>>,
}

#[ast]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[serde(tag = "type")]
pub struct PrivateIdentifier<'a> {
    #[serde(flatten)]
    pub span: Span,
    pub name: Atom<'a>,
}

#[ast]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
pub struct Span {
    start: u32,
    end: u32,
}

Instead, we can use ast_tools in 2 ways to remove this boilerplate:

  1. Make things that we implement on every type the defaults, so they don't need to be stated over and over.
  2. Use ast_tools's knowledge of the whole AST to move the instruction to flatten Span onto Span type itself. "flatten this" instruction does not need to be repeated on every type that contains Span.
#[ast]
#[generate_derive(ESTree)]
pub struct ClassBody<'a> { // <-- no `#[serde(tag = "type")]` attr
    pub span: Span, // <-- no `#[serde(flatten)]` attr
    pub body: Vec<'a, ClassElement<'a>>,
}

#[ast]
#[generate_derive(ESTree)]
pub struct PrivateIdentifier<'a> { // <-- no `#[serde(tag = "type")]` attr
    pub span: Span, // <-- no `#[serde(flatten)]` attr
    pub name: Atom<'a>,
}

#[ast]
#[generate_derive(ESTree)]
#[estree(flatten)] // <-- `flatten` is here now
pub struct Span {
    start: u32,
    end: u32,
}

I think this is an improvement. How types are serialized is not core to the function of the AST. I don't see moving the serialization logic elsewhere as "hiding it away", but rather a nice separation of concerns.

3. Open the door to different serializations

In example above Serialize has been replaced by ESTree. This is to allow for different serialization methods in future. For example:

Different serializers for plain JS AST and TS AST

When serializing a plain JS file, could produce JSON which skips all the TS fields, to make an AST which exactly aligns with canonical ESTree. We'd add #[ts] attribute to all TS-related fields, and ESTreeJS serializer would skip those fields. This would make the AST faster to deserialize on JS side.

The other advantage is the TS-less AST should perfectly match classic ESTree, so we can test it in full using Acorn's test suite.

Users who are not interested in type info can also request the cheaper JS-only AST, even when parsing TS code.

Serialize to other AST variants

e.g. #[generate_derive(Babel)] to serialize to a Babel-compatible JSON AST.

const {program} = parse(code, {flavor: 'babel'});

Not sure if this is useful, but this change makes it a possibility if we want to.

4. Simplify implementation of custom serialization

Currently we have pretty complex custom Serialize impls for massaging Oxc's AST into ESTree-compatible shape in oxc_ast/src/serialize.rs.

We can remove most of them if we use ast_tools to generate Serialize impls for us, guiding it with attributes on the AST types themselves:

#[ast]
#[generate_derive(ESTree)]
pub struct ObjectPattern<'a> {
    pub span: Span,
    pub properties: Vec<'a, BindingProperty<'a>>,
    #[estree(append_to_previous)]
    pub rest: Option<Box<'a, BindingRestElement<'a>>>,
}

5. Simply AST transfer code

AST transfer's JS-side deserializer (and eventually serializer too) can be simplified in same way, generating code for JS-side deserializer which matches the Rust-side one exactly, without writing the same logic twice and having to keep them in sync.

6. TS type generation

What "massaging" of the Rust AST we do to turn it into an ESTree-compatible JSON AST is now encoded as static attributes. We can use this to generate TS types, and we can get rid of Tsify.

How difficult is this?

serde's derive macro looks forbiddingly complex. But this is because it handles every conceivable case, almost all of which we don't use. The output it generates for our AST types is actually not so complicated.

So creating a codegen for impl Serialize I don't think would be too difficult.

overlookmotel avatar Oct 04 '24 10:10 overlookmotel