parsley Internal combinator name annotation

Internal combinator name annotation

Open j-mie6 opened this issue 10 months ago • 1 comments

As shown in #235, the new parsley.debugger functionality suffers from exposing underlying implementations in a way that might not be obvious at all to a user. A looping .foldLeft reports that pure is to blame, even when the user has not used it. In fact, this is the folding function's pure being executed first, so witnessing the bad trace. The other problem is the internal names given to combinators when the parser has no reflectively determined name is often not going to be what the user expects themselves at all; and care must be taken to properly abstract combinators from across parsley with the same naming machinery.

There are three solutions that I can think of for this problem.

`Freeze`/`Thaw`

Two combinators, freeze and thaw (names pending), can be used together to render the internals of a combinator opaque, with the arguments exposed via the thaw. This way, the automatic tagging machinery used by parsley.debugger will just skip over frozen nodes and take the name directly from the top node (this means the prettyName field of LazyParsley can be removed). The Freeze and Thaw nodes themselves can basically be no-op nodes that just forward to the underlying implementation, meaning they just incur an allocation cost and a cheap method forwarding: they would have no equivalent within StrictParsley, so would not affect the optimiser.

The problems with this approach are:

A need to annotate every combinator within the library by hand to ensure they have the correct name and opacity, which is a minor maintenance burden.
Additional overheads (it's not clear how much) on the "happy" path, which is not desirable.
Increased memory overhead (of around 16 bytes) on every combinator
To reduce the above two, it would be necessary to split out internal implementations from external facing API, so that no tagging needs to occur within combinators, this increases code size and creates some small maintenance burden.

The advantages are:

Perfectly accurate debugging traces that are meaningful to the user -- including within bridges.
The user could themselves use freeze/thaw if they wanted to define their own combinators.
Less machinery required within parsley-debug to index internal parsley modules and the lexer, which reduces maintenance burden.
Reduces the overheads of parsley-debug combinators, which will have sparser traces.

If it turns out that the cost on the "happy" path is larger than anticipated, it would be possible to lock the combinators behind a conditional guard:

A method like parsley.debugger.enable() is exposed from parsley-debug, which the user must call in their main function
This method sets an internal flag at the definition side of freeze/thaw (which might be inside parsley.debug), which enables the combinators.
If the flag is not set, the combinators are just the identity function: this trades an allocation on the "happy" path for a cheap conditional check instead (and this will likely be JIT'd out).
The downside to this is the obvious usability overhead.

`Opaque`/`transparent`

The second method instead opts for one AST node, and one meta-combinator that can destroy that node. In this case, the debugger is only allowed to latch onto Opaque nodes, and everything else is ignored. Notice that in the freeze/thaw approach, it is highly likely that every thaw combinator is wrapped around a freeze from another combinator. It's this insight that allows it to be dropped in favour of a reduced and more explicit set.

The advantages here are as follows:

Less overall allocations are required for each individual combinator.
User could still use opaque on their own combinators, and use transparent to hide library ones (even in wider parsers to suppress).
Debug traces are still accurate, and the same combinator can be given multiple names via separate Opaque on top of each other if desired in a user combinator.
parsley-debug logic is vastly simplified, as there is one attachment site, with no counting required to deal with anti-opaque nodes
Less maintenance burden, as the combinator only need be labeled at the top.
Simple extension to allow for @parsley.debuggable to automatically annotate defs with an opaque combinator, without needing to traverse the parser definition to also add in the inverse operation -- this improves the overall experience.
The prettyName field can be removed, since the only attachment site is the Opaque node.

The disadvantages are as above, with less overall memory overheads.

`opaque`/`transparent`

While the above strategy is certainly an improvement over the original scheme, it still does incur additional allocation costs on the "happy" path. An alternative approach would be to instead wholely rely on the prettyName field of the existing AST nodes. Instead of having an Opaque node, make prettyName a var, where null represents a transparent node. The opaque combinator simply sets this field, for no additional allocation cost, and transparent simply clears it back to null. Primitives can be parameterised by this name so that no additional call is required, they can be set at initialisation (then potentially cleared when used with transparent, or separate internal implementation set to null initially.

This has the following advantages:

Close to zero-cost along the "happy" path, with a small overhead for setting the initial name/clearing it.
Setting is cheap, so less duplication of internal API is required.
User can still annotate their own internal combinators, and hide other parsley combinators if they wish.
The @parsley.debuggable annotation can still be used to annotate defs as above.

The disadvantages are:

The combinators are strictly less expressive: the user cannot ascribe multiple names to a combinator, as it is overriden by another opaque (classic nullable/Option).
Singleton AST nodes that can be exposed in the top-level API must be removed, and replaced by class
Any val parsers would either need to be reformulated with def, or would otherwise not be allowed to appear opaquely as another combinator (there is a globally unique naming for these singletons). While you could argue that this is avoidable by using val and allowing the name to be set that way, parsers like:
```
def choose(b: Boolean) = if b then digit else letter
```
would mark with opaque and destroy the name of both digit and letter in other parts of the code. As such, it becomes necessary to switch them to def, and allocate uniquely for them, which will increase memory overheads a bit.
The updating of a name across shared parsers is now not thread-safe. However, since the results of annotation are not used in the happy path, this is in some sense ok: the names may race, but they are never observed. The machinery behind the debugger is not thread-safe anyway, so single-threaded debugging is assumed.
The use of null and the mutable variables in the LazyParsley tree is a code smell, however is necessary for allocationless and cheap operation (reusing an existing field).
The complexity of the debugging machinery remains the same, as any node requires attachment.

Apr 17 '24 14:04 j-mie6

parsley parsley copied to clipboard

Internal combinator name annotation

Freeze/Thaw

Opaque/transparent

opaque/transparent

parsley
parsley copied to clipboard

`Freeze`/`Thaw`

`Opaque`/`transparent`

`opaque`/`transparent`