swc
swc copied to clipboard
RFC: Use a flat structure for storing nodes instead of an AST
Describe the feature
This RFC (Request for Comments) is meant as a suggestion for an improvement to the memory structure of the current syntax tree based on data-oriented patterns.
This RFC also aims to provide full compatibility with the Visit
trait from swc_ecma_visit
. The VisitMut
trait implementation details are still up for debate.
1. Introduction to Data-Oriented Design
Processors load data by cache lines (usually 64 or 128 bytes) which makes contiguous arrays perfect for data similar in semantics. Because of cache lines, the processor will load the next few elements from the array which has the possibility of improving the performance of visiting the tree.
2. The New Format
To continue iterating on this RFC, we must define the new structs and enums that compose our new format. To do this, let's start by defining the Node
struct, which is the core item inside of our new format.
struct Node {
kind: NodeKind,
data: NodeData,
id: NodeId
}
Each Node
will represent an item in the tree. These can be expressions, declarations, and others.
Now let's define the enums that Node
uses.
2.1. NodeKind
In Rust, the size of an enum is the same as the size of the largest variant in the enum. This brings some performance hits when attempting to match the enum variant. To fix this, we will implement a new NodeKind
enum that is 1-bit sized and holds the type of a Node
.
The definition for NodeKind
would be as follows:
pub enum NodeKind {
ArrayLit,
ArrayPat,
// ...
}
Every struct and enum inside the current version of swc_ecma_ast
will be used to create the NodeKind
enum.
2.2. NodeData
An AST without data would be pretty useless. That's why we will also have the NodeData
enum, which will contain the current structs inside swc_ecma_ast
, with some modifications to fit them into our new format.
Most notably, references to other Node
s will be replaced with NodeId
.
pub enum NodeData {
ArrayLit(ArrayLit),
ArrayPat(ArrayPat),
// ...
}
2.3. NodeId
To be able to reference Node
s, we will use IDs, which are indexes into our Vec
.
pub struct NodeId(u64);
2.4. Module
The new Module
struct will have the following form:
pub struct Module {
nodes: Vec<Node>
}
3. Understanding Vec<Node>
To understand the way that Vec<Node>
is constructed via the parser, we have to make sure that children always come after their parents.
To make this easier to grasp, I've provided a tree structure to flatten format transformation example.
item1
item2
item3
item4
The tree above will be organized as:
item1, item2, item3, item4
4. Compatibility with Visit
Due to the ordering of the nodes inside of Vec<Node>
, the Visit
trait is trivial to implement. A naive implementation would look like this:
for node in module.nodes {
// Based on the node kind, call the correct function on the visitor
}
This does not include any control flow but that can be easily implemented later.
5. Closing Thoughts
Please let me know if you have any ideas or questions regarding this RFC. Also, solutions for VisitMut
are needed before attempting to implement this, otherwise we'd be introducing breaking changes in the API.
Babel plugin or link to the feature description
No response
Additional context
No response