ast-types
ast-types copied to clipboard
iteration based visitor with meta-programming optimizations
Comes from here https://github.com/benjamn/ast-types/issues/80
This is a work in progress but already seems to be a valid way to greatly improve the performance of the traversal. It's very close to the x10 @stefanpenner wished for :)
Here is the gist of it:
- Traversal (post-order) avoids recursion and allocations
- Current visiting path is kept using a stack array (
NodePathlike object to be introduced later) - Optimized code paths by meta programming (basically just compiling a function with a switch statement)
Things yet to be implemented:
Simplify the visitor creation pattern (it's inherited from a previous branch)Test pre-order traversal, it should work in theory but there seems to be a bug somewhere- NodePath like object for transforms (I had a version but I can't remember where I saved it :flushed:)
- Improve the compilation step since right now is quite hacky analyzing the types (@benjamn please help :)
- Analyze v8 optimizer logs and see if it can be further improved (the compilation step can probably help greatly here).
The API is quite different from the current one, now the visitors receive the actual node (and an optional NodePath). If they want to control the order of the traversal they have to call this.traverse([visitor]) but they can't provide a specific node since it would invalidate the stack, otherwise the traversal is automatic. To visit arbitrary nodes they must call this.visit(node), it'll generate a new root stack and visitor state (returns the instance). Perhaps some helper like visitChild(childName) could be introduced in order to keep the previous stack.
Some speed figures for visitNode:
> node test/perf.js
NodePath: [102, 84, 86, 80, 82] avg 86.8 (31475)
Visitor: [17, 11, 11, 12, 11] avg 12.4 (31470)
Precompiled: [8, 4, 3, 3, 6] avg 4.8 (31470)
and for visitFunctionExpression:
> node test/perf.js
NodePath: [84, 64, 73, 67, 77] avg 73 (595)
Visitor: [12, 5, 5, 5, 5] avg 6.4 (595)
Precompiled: [4, 2, 1, 1, 1] avg 1.8 (595)
The precompiled is just that the visitor is created outside the benchmark, basically to show that for multiple passes (ie: a finder) the speeds are unreal :)
When no optimized code paths compilation is performed speed is degraded around a 60%, so perhaps it can be an optional feature.
So please, review the new API and let's iterate on it until it fits everyone's expectations.
![]()
- [ ]
File: lib/visitor.js:L1-368 - <img border=0 src='https://avatars.githubusercontent.com/u/5750?v=3' height=16 width=16'> Thoughts about the existing strategy of defining a
Contextconstructor and recycling contexts to avoid unnecessaryObject.createcalls? See https://github.com/benjamn/ast-types/blob/e866485d72/lib/path-visitor.js#L140-L144 - <img border=0 src='https://avatars.githubusercontent.com/u/110784?v=3' height=16 width=16'> Yeah, I saw the current recycling but I don't think the overhead of
Object.createwould be a problem here, it's not very fast but I'm working under the assumption that visitors won't be callingvisitvery often. Anyway, perhaps we can use something more optimized for this use case:
Visitor.DummyConstructor = function () {};
...
Visitor.DummyConstructor.prototype = this;
var context = new Visitor.DummyConstructor();
- [ ]
General Comment - <img border=0 src='https://avatars.githubusercontent.com/u/5750?v=3' height=16 width=16'> Thanks for this! I realize this is a work in progress, so I'm definitely thinking about it as an experiment that may have to be subdivided and/or expanded in different directions. As a personal note, my parents just headed home from NYC, so I have more time to devote to this project :) I'm somewhat concerned the current version of this PR is changing the API too much to allow a meaningful performance comparison. Can we find ways to optimize traversal speed incrementally that preserve the existing API? That way we'll know we've made progress, and API churn for authors of existing code will be minimized.
- <img border=0 src='https://avatars.githubusercontent.com/u/5750?v=3' height=16 width=16'> I'm not really comfortable forcing a post-order traversal, and I don't think it's enough to provide pre-order traversal as an alternative, since it's useful in many cases to be able to perform operations both before and after the traversal. That's why the existing API gives you the freedom to call
this.traversewhenever you want to. - <img border=0 src='https://avatars.githubusercontent.com/u/110784?v=3' height=16 width=16'> About the new API, I was trying initially to keep it mostly the same but I couldn't get the level of performance I was expecting. Many of the optimizations can be adapted to follow the current API but I think this one hits a cool spot since it's very easy to optimize by the JS engines, avoids dynamic dispatching and deep nested call stacks as much as possible. Regarding the order of the traversal, I think I haven't made it clear in my description, sorry about that. The visitor can still decide when and how do the traversal, the design really shines when the traversal is post-order, since it avoids any calls or allocations, it's just another iteration on the main loop. Here is how it works:
visitBinaryExpression: function (node) {
this.traverse(); // pre-order traversal
console.log(node.left);
},
visitBinaryExpression: function (node) { // edited
this.traverse('left'); // Traverse node.left keeping the *path stack* valid
console.log('in the middle');
this.traverse('right');
},
visitUnaryExpression: function (node) {
console.log(node.type);
// since no traversal was made the default post-order will be automatically done
},
visitCallExpression: function (node) {
this.visit(node.callee); // pre-order but the *stack* will be rooted (ie: it can't navigate to Program)
return false; // we've done the traversal in a custom way so not do it automatically
},
visitFunctionExpression: function (node) {
this.traverse(myCoolFuncVisitor); // traverse using a specific visitor
},
visitThisExpression: function (node, path) {
// Since it has arity-2 we'll receive a NodePath like object to transform
path.replace( .... )
}
Hopefully that makes the usage more clear, it's mostly the same functionality we have now but since we can't traverse jumping from one ancestor to a granchild without keep the path stack updated, every time we need to do it we call .visit(node) instead of .traverse(node) so it's more obvious. Perhaps a .traverse("childName") could be added, which would reuse the current path stack.
- <img border=0 src='https://avatars.githubusercontent.com/u/1377?v=3' height=16 width=16'> woohooo! Hopefully i find some time to muck around this this. It would be awesome if someone had time to port over some of the esnext transforms. I'll try to find some time this weekend to get acquainted with the api changes.
- <img border=0 src='https://avatars.githubusercontent.com/u/110784?v=3' height=16 width=16'> I just introduced
StackPath, which is the equivalent to the currentNodePathbut they are created on demand and aren't synched with the transformations happening elsewhere. Right now it's just a proof of concept, only parent/child navigation and replacement is implemented (and probably buggy :flushed:). intest/perf.jsthere is a bit of example usage but basically it works like this: - If the visitor method has an arity different of 1 it'll receive a
StackPathobject as second argument this.pathis always the same instance which shares the path with the visitorStackPathcan be easily cloned with.clone(), it just clones the path, so if there are transformations in the AST it can get out of sync. Now, there is some stuff inNodePathabout checking if something needs parens. Is that actually used/needed? Since now replacement works simple stuff like es6-arrow can probably be ported to see how these changes feel with a real work example.- <img border=0 src='https://avatars.githubusercontent.com/u/5750?v=3' height=16 width=16'> The
needsParensmethod is used pretty heavily by https://github.com/benjamn/recast, and it's a good example of an AST utility method that requires knowledge of ancestor nodes. In other words, thanks for preserving it! - <img border=0 src='https://avatars.githubusercontent.com/u/5750?v=3' height=16 width=16'> After looking at the code, I like
StackPathso much that I'd actually prefer to reimplementNodePathusing a stack, rather than having two differentPathsubtypes. As a bonus, I think we can maintain total backwards compatibility. How this might work: - Where the
NodePathconstructor currently takes aparentPathparameter (which can be anotherNodePathor a falsy value), let it alternatively take an array of[name, value, ...], exactly like yourprevPath. - The
NodePathconstructor should make a defensive copy of this array, just like you're doing withprevPath.concat([name, wrap(value)]). - The
prevPatharray should be able to contain aNodePathinstance as its first element, so that it's easy to start with aNodePaththat was created earlier and continue generating a cheap path of its descendants by pushing names and values onto an array. this.parentPaththen becomes a lazily computed property (another getter onNodePath.prototype, like.nodeor.parent). This seems fine with me because nobody usesthis.parentPath, but the implementation ofNodePathmight need to change, since it currently considersthis.parentPathto be a cheap property lookup.- When we create child paths, we have a couple of options. We can either share the parent path's internal
this.stackthat we defensively copied, or we can just keep using the old style:var childPath = new NodePath(childValue, parentPath, childName). - We can reimplement
NodePath.prototype.needsParensin terms ofthis.stackinstead ofthis.parentPath, which should be a significant performance gain. If we do this right, I thinkNodePath.prototype.replaceshould continue to work as before, without too much trouble. We just have to be careful to deal with any places where we're assigning to a property that only has a getter (like.value). ThePathVisitortraversal code will maintain a mutable array of names and values until it encounters a node that needs to be handled by a visitor method, at which point it will create a singleNodePath. This will be a huge win for traversals that never encounter the node type they're looking for, because noNodePathobjects will need to be created. I tried to achieve something like this for #49, but I didn't know how at the time. - <img border=0 src='https://avatars.githubusercontent.com/u/110784?v=3' height=16 width=16'> That's a cool idea! If
NodePathcan be created on-demand the performance could probably be doubled without breaking the API. Then for a next major version a new API can be introduced that allows an iteration based traversal without allocations. I think something like this should work pretty well:
visitBinaryExpression: function (node, path) {
this.traverse() // pre-order
// Note: exposing the `.traverse` on the path allows it to keep the stack complete
path.traverse('left')
// this.visit always creates a new context with a rooted stack path
this.visit(node.right).hasThisExpression;
// path.visit creates a new context with a complete stack path
path.get('right').visit().hasThisExpression;
}
Anyhow, I'm going on a trip tomorrow so I won't be able to work on this for a while, please feel free to advance it in my absence. It'll be really sweet if that NodePath refactor has materialized when I return :)
- [ ]
File: lib/stack-path.js:L1-550 - <img border=0 src='https://avatars.githubusercontent.com/u/5750?v=3' height=16 width=16'> Another invariant possibly worth asserting here:
if (prevStack) assert.strictEqual(prevStack[prevStack.length - 1][name], value)
Thanks for this!
I realize this is a work in progress, so I'm definitely thinking about it as an experiment that may have to be subdivided and/or expanded in different directions. As a personal note, my parents just headed home from NYC, so I have more time to devote to this project :)
I'm somewhat concerned the current version of this PR is changing the API too much to allow a meaningful performance comparison. Can we find ways to optimize traversal speed incrementally that preserve the existing API? That way we'll know we've made progress, and API churn for authors of existing code will be minimized.
I'm not really comfortable forcing a post-order traversal, and I don't think it's enough to provide pre-order traversal as an alternative, since it's useful in many cases to be able to perform operations both before and after the traversal. That's why the existing API gives you the freedom to call this.traverse whenever you want to.
About the new API, I was trying initially to keep it mostly the same but I couldn't get the level of performance I was expecting. Many of the optimizations can be adapted to follow the current API but I think this one hits a cool spot since it's very easy to optimize by the JS engines, avoids dynamic dispatching and deep nested call stacks as much as possible.
Regarding the order of the traversal, I think I haven't made it clear in my description, sorry about that. The visitor can still decide when and how do the traversal, the design really shines when the traversal is post-order, since it avoids any calls or allocations, it's just another iteration on the main loop. Here is how it works:
visitBinaryExpression: function (node) {
this.traverse(); // pre-order traversal
console.log(node.left);
},
visitBinaryExpression: function (node) { // edited
this.traverse('left'); // Traverse node.left keeping the *path stack* valid
console.log('in the middle');
this.traverse('right');
},
visitUnaryExpression: function (node) {
console.log(node.type);
// since no traversal was made the default post-order will be automatically done
},
visitCallExpression: function (node) {
this.visit(node.callee); // pre-order but the *stack* will be rooted (ie: it can't navigate to Program)
return false; // we've done the traversal in a custom way so not do it automatically
},
visitFunctionExpression: function (node) {
this.traverse(myCoolFuncVisitor); // traverse using a specific visitor
},
visitThisExpression: function (node, path) {
// Since it has arity-2 we'll receive a NodePath like object to transform
path.replace( .... )
}
Hopefully that makes the usage more clear, it's mostly the same functionality we have now but since we can't traverse jumping from one ancestor to a granchild without keep the path stack updated, every time we need to do it we call .visit(node) instead of .traverse(node) so it's more obvious. Perhaps a .traverse("childName") could be added, which would reuse the current path stack.
woohooo! Hopefully i find some time to muck around this this.
It would be awesome if someone had time to port over some of the esnext transforms. I'll try to find some time this weekend to get acquainted with the api changes.
I just introduced StackPath, which is the equivalent to the current NodePath but they are created on demand and aren't synched with the transformations happening elsewhere. Right now it's just a proof of concept, only parent/child navigation and replacement is implemented (and probably buggy :flushed:).
in test/perf.js there is a bit of example usage but basically it works like this:
- If the visitor method has an arity different of 1 it'll receive a
StackPathobject as second argument this.pathis always the same instance which shares the path with the visitorStackPathcan be easily cloned with.clone(), it just clones the path, so if there are transformations in the AST it can get out of sync.
Now, there is some stuff in NodePath about checking if something needs parens. Is that actually used/needed?
Since now replacement works simple stuff like es6-arrow can probably be ported to see how these changes feel with a real work example.
The needsParens method is used pretty heavily by https://github.com/benjamn/recast, and it's a good example of an AST utility method that requires knowledge of ancestor nodes.
In other words, thanks for preserving it!
After looking at the code, I like StackPath so much that I'd actually prefer to reimplement NodePath using a stack, rather than having two different Path subtypes. As a bonus, I think we can maintain total backwards compatibility.
How this might work:
- Where the
NodePathconstructor currently takes aparentPathparameter (which can be anotherNodePathor a falsy value), let it alternatively take an array of[name, value, ...], exactly like yourprevPath. - The
NodePathconstructor should make a defensive copy of this array, just like you're doing withprevPath.concat([name, wrap(value)]). - The
prevPatharray should be able to contain aNodePathinstance as its first element, so that it's easy to start with aNodePaththat was created earlier and continue generating a cheap path of its descendants by pushing names and values onto an array. this.parentPaththen becomes a lazily computed property (another getter onNodePath.prototype, like.nodeor.parent). This seems fine with me because nobody usesthis.parentPath, but the implementation ofNodePathmight need to change, since it currently considersthis.parentPathto be a cheap property lookup.- When we create child paths, we have a couple of options. We can either share the parent path's internal
this.stackthat we defensively copied, or we can just keep using the old style:var childPath = new NodePath(childValue, parentPath, childName). - We can reimplement
NodePath.prototype.needsParensin terms ofthis.stackinstead ofthis.parentPath, which should be a significant performance gain.
If we do this right, I think NodePath.prototype.replace should continue to work as before, without too much trouble. We just have to be careful to deal with any places where we're assigning to a property that only has a getter (like .value).
The PathVisitor traversal code will maintain a mutable array of names and values until it encounters a node that needs to be handled by a visitor method, at which point it will create a single NodePath.
This will be a huge win for traversals that never encounter the node type they're looking for, because no NodePath objects will need to be created. I tried to achieve something like this for #49, but I didn't know how at the time.
That's a cool idea! If NodePath can be created on-demand the performance could probably be doubled without breaking the API. Then for a next major version a new API can be introduced that allows an iteration based traversal without allocations. I think something like this should work pretty well:
visitBinaryExpression: function (node, path) {
this.traverse() // pre-order
// Note: exposing the `.traverse` on the path allows it to keep the stack complete
path.traverse('left')
// this.visit always creates a new context with a rooted stack path
this.visit(node.right).hasThisExpression;
// path.visit creates a new context with a complete stack path
path.get('right').visit().hasThisExpression;
}
Anyhow, I'm going on a trip tomorrow so I won't be able to work on this for a while, please feel free to advance it in my absence. It'll be really sweet if that NodePath refactor has materialized when I return :)