parsley
parsley copied to clipboard
Internal combinator name annotation
As shown in #235, the new parsley.debugger
functionality suffers from exposing underlying implementations in a way that might not be obvious at all to a user. A looping .foldLeft
reports that pure
is to blame, even when the user has not used it. In fact, this is the folding function's pure
being executed first, so witnessing the bad trace. The other problem is the internal names given to combinators when the parser has no reflectively determined name is often not going to be what the user expects themselves at all; and care must be taken to properly abstract combinators from across parsley with the same naming machinery.
There are three solutions that I can think of for this problem.
Freeze
/Thaw
Two combinators, freeze
and thaw
(names pending), can be used together to render the internals of a combinator opaque, with the arguments exposed via the thaw
. This way, the automatic tagging machinery used by parsley.debugger
will just skip over frozen nodes and take the name directly from the top node (this means the prettyName
field of LazyParsley
can be removed). The Freeze
and Thaw
nodes themselves can basically be no-op nodes that just forward to the underlying implementation, meaning they just incur an allocation cost and a cheap method forwarding: they would have no equivalent within StrictParsley
, so would not affect the optimiser.
The problems with this approach are:
- A need to annotate every combinator within the library by hand to ensure they have the correct name and opacity, which is a minor maintenance burden.
- Additional overheads (it's not clear how much) on the "happy" path, which is not desirable.
- Increased memory overhead (of around 16 bytes) on every combinator
- To reduce the above two, it would be necessary to split out internal implementations from external facing API, so that no tagging needs to occur within combinators, this increases code size and creates some small maintenance burden.
The advantages are:
- Perfectly accurate debugging traces that are meaningful to the user -- including within bridges.
- The user could themselves use
freeze
/thaw
if they wanted to define their own combinators. - Less machinery required within
parsley-debug
to index internal parsley modules and the lexer, which reduces maintenance burden. - Reduces the overheads of
parsley-debug
combinators, which will have sparser traces.
If it turns out that the cost on the "happy" path is larger than anticipated, it would be possible to lock the combinators behind a conditional guard:
- A method like
parsley.debugger.enable()
is exposed fromparsley-debug
, which the user must call in theirmain
function - This method sets an internal flag at the definition side of
freeze
/thaw
(which might be insideparsley.debug
), which enables the combinators. - If the flag is not set, the combinators are just the identity function: this trades an allocation on the "happy" path for a cheap conditional check instead (and this will likely be JIT'd out).
- The downside to this is the obvious usability overhead.
Opaque
/transparent
The second method instead opts for one AST node, and one meta-combinator that can destroy that node. In this case, the debugger is only allowed to latch onto Opaque
nodes, and everything else is ignored. Notice that in the freeze
/thaw
approach, it is highly likely that every thaw
combinator is wrapped around a freeze
from another combinator. It's this insight that allows it to be dropped in favour of a reduced and more explicit set.
The advantages here are as follows:
- Less overall allocations are required for each individual combinator.
- User could still use
opaque
on their own combinators, and usetransparent
to hide library ones (even in wider parsers to suppress). - Debug traces are still accurate, and the same combinator can be given multiple names via separate
Opaque
on top of each other if desired in a user combinator. -
parsley-debug
logic is vastly simplified, as there is one attachment site, with no counting required to deal with anti-opaque nodes - Less maintenance burden, as the combinator only need be labeled at the top.
- Simple extension to allow for
@parsley.debuggable
to automatically annotatedef
s with anopaque
combinator, without needing to traverse the parser definition to also add in the inverse operation -- this improves the overall experience. - The
prettyName
field can be removed, since the only attachment site is theOpaque
node.
The disadvantages are as above, with less overall memory overheads.
opaque
/transparent
While the above strategy is certainly an improvement over the original scheme, it still does incur additional allocation costs on the "happy" path. An alternative approach would be to instead wholely rely on the prettyName
field of the existing AST nodes. Instead of having an Opaque
node, make prettyName
a var
, where null
represents a transparent node. The opaque
combinator simply sets this field, for no additional allocation cost, and transparent
simply clears it back to null
.
Primitives can be parameterised by this name so that no additional call is required, they can be set at initialisation (then potentially cleared when used with transparent
, or separate internal implementation set to null
initially.
This has the following advantages:
- Close to zero-cost along the "happy" path, with a small overhead for setting the initial name/clearing it.
- Setting is cheap, so less duplication of internal API is required.
- User can still annotate their own internal combinators, and hide other parsley combinators if they wish.
- The
@parsley.debuggable
annotation can still be used to annotatedef
s as above.
The disadvantages are:
-
The combinators are strictly less expressive: the user cannot ascribe multiple names to a combinator, as it is overriden by another opaque (classic nullable/
Option
). -
Singleton AST nodes that can be exposed in the top-level API must be removed, and replaced by
class
-
Any
val
parsers would either need to be reformulated withdef
, or would otherwise not be allowed to appear opaquely as another combinator (there is a globally unique naming for these singletons). While you could argue that this is avoidable by usingval
and allowing the name to be set that way, parsers like:def choose(b: Boolean) = if b then digit else letter
would mark with
opaque
and destroy the name of bothdigit
andletter
in other parts of the code. As such, it becomes necessary to switch them todef
, and allocate uniquely for them, which will increase memory overheads a bit. -
The updating of a name across shared parsers is now not thread-safe. However, since the results of annotation are not used in the happy path, this is in some sense ok: the names may race, but they are never observed. The machinery behind the debugger is not thread-safe anyway, so single-threaded debugging is assumed.
-
The use of
null
and the mutable variables in theLazyParsley
tree is a code smell, however is necessary for allocationless and cheap operation (reusing an existing field). -
The complexity of the debugging machinery remains the same, as any node requires attachment.