regexp-tree
regexp-tree copied to clipboard
Transform: utils for supporting tree consistency
Some utils might be useful to abstract away a boilerplate users have to do if they want to keep an AST consistent after some modifications.
Example: all capturing groups have number
property, which is built during parsing. However, if some transform in pipeline adds/removes a capturing group, the numbers of all other existing groups become unsync, and further may not work as expected.
Responsibility of rebuilding needed data is on transform implementers, however, we can provide some useful utils to work with AST (and which do not fit to be on path
). E.g.:
// Original, the group containing `foo`
// has index 1 in AST node.
/(foo)/
Transformed (inserted a group with bar
):
// Bug, foo still has index 1.
/(bar)(foo)/
For this callers can do:
// Rest all the indices of capturing groups.
transformUtils.rebuildGroupIndex(ast);
There are some other inconsistencies, most noticeable of which are: location data, which store offsets, and original source -- obviously when inserting/replacing nodes, all such data become unsync. Some cases we can keep untouched (and mark them as "unreliable" if several transforms are applied in a pipeline).
Besides, the utils can contain other useful methods for higher-abstract tree manipulations/cleanups/etc (which are not on path
).
From my point of view, rebuildGroupIndex is a transformation (traverse + changing data in AST). So, no need for a function.
Generally, everything changing the AST is a transformation.
Now, that said, I think the result of processing the namedGroups could be moved into the RegExp-object at the root of the AST.
The group name is already stored in the "Group" object, so the group index could be stored there, too.
It would even be reasonable to determine the index while parsing. But I think it's better done as a transformation, because it's not needed by everyone.
So I see these transformations:
- build_group_indexes (puts them in the "Group" object)
- build_groups_array (puts the names and indexes into an array in "RegExp" object)
for different situations, you can also have other transformations, e.g. building other helper data structures for converting the original result.
It's also attractive to store these data structures in the "RegExp" object (or AST in general), because generators etc. can use them without needing additional parameters.
I see, you already have added "number" to the group.
The reference to the source seems to be invalidated by several transformation. But you can see it as a source map. E.g. when a named group converted to a simple group creates an error, you could point at the original source.
Yeah, technically it is a transformation, though semantically could be a util (which just uses transformation under the hood :)).
And yes, the number
property already exists on a Group
node.
So everything what is determined at parsing stage, theoretically (and practically) is "non-reliable" if more than one transform is used in the pipeline (source
is unsync, "number" of capturing groups is unsync if groups are added/removed, location data is unsync etc). That's fine in some cases, although, plugin implementers may want to choose to preserve at least some "invariants" -- by using those util methods, and rebuilding any needed data they affected.
Doing "generate/re-parse" after each stage in a pipeline would avoid this, but would be impractical on bigger inputs (e.g. several thousand files to process in a code base).
ok, if I were a user of the package with less understanding of the underlying algorithms, I would prefer to simply chain several transformations. I could probably understand when a transformation is documented as making this or that unreliable, if also would be documented, that I have to use transformation build_this_and_that when I need them. Perhaps a table, which transformation invalidates which values would be nice.
Ok, when we can build such a table, we can also manage flags in "RegExp" object for each property (or feature) e.g. property_x_valid. These would be set by the parser and transformations that create or rebuild them (producer) and reset by transformations that invalidate them. A transformation (or general consumer) can register which properties it needs and then a transformation sequencer can ensure valid values in between. Simple (re)building transformations could be registered for a property flag and invoked when necessary. This way all is safe and optimal and easy to use.
Such a generic pub/sub system (when you can subscribe for needed props, and ensure they are invalidated/cleaned up by the system after each transform) sounds interesting, and could be good actually, although, it might add more complexity for end users as well (in contrast with using small util methods). Also, it may add complexity to the sequencer itself (taking time to manage this, especially in cases when this pub/sub won't be needed by all the transforms).
The astUtils
from their name assume they are related to AST manipulations. It can be not only changing/cleanup operations, but e.g. search of different kinds. E.g. you may want to do some replace, only if some deeply nested child is contained, or something (astUtils.containsChild(...)
, technically can be on path
though).