pml-companion
pml-companion copied to clipboard
New Nodes to Handle Edge Cases: word-joiner and blank/empty
I've noticed that in the PML User Manual, section Anatomy of a PML Document » Attributes, the line code for the escape character is forced to contain a trailing space (\ ) — in the source file 05_anatomy.pml:
must be terminated by a backslash ([c \\ ]),
The problem here is that using [c \\] instead of [c \\ ] won't work because it would be parsed as [c+\+\], i.e. the second slash is being interpreted as escaping the closing bracket.
To avoid similar problems (which are typical edge cases found on all lightweight syntaxes) I suggest adding some extra special characters:
[emptyor[blank— replaced by nothing (empty string), post-parsing. It's sole role is to feed a token separator to the parser.[wj— word-joiner character (⁠); a code point in Unicode that prevents a line break at its position.
(obviously, no closing bracket required for either)
The above example from the PML User Manual could then be fixed via:
must be terminated by a backslash ([c \\[empty]),
Both of these are useful hacks to handle edge-cases where the PML parser could be faced with ambiguities like the above example, and they would be the equivalents of Asciidoctor's predefined characters-substitutions attributes {empty}/{blank} and {wj}, which are extremely useful to handle all sort of edge-cases in AsciiDoc sources.
In Asciidoctor, {empty} and {blank} are identical, one is just an alias of the other; I personally prefer [empty to [blank, for I believe it's clearer, and I'd avoid having having both, since it's redundant.
The [wj is also very useful in situations where you need to prevent the browser from wrapping a table column during auto-adjustment (e.g. because one column contains words separated by boundaries like spaces, hyphens, brackets, etc.). Or to prevent wrapping a line between a word and its footnote marker, e.g. someword[1] → someword+\n+[1], whereas someword[wj[1]
and sometimes they can just improve source readability
These would be consistent with the current [nl and [sp substitutions available in PML.
References
the line code for the escape character is forced to contain a trailing space
Well spotted!
The reason is that the current parser uses a regex that does not consider this edge-case.
The new pXML parser (which only reads a sequence of characters (no regexes)) will parse [c \\] correctly as a node c with content \.
However, it's a very good idea to add 'word_joiner' and 'empty' nodes. They can help to explicitly eliminate ambiguities like this, and they are useful in other cases as well, as you mentioned. Will be done. Easy to implement.
I personally prefer
[emptyto[blank, for I believe it's clearer, and I'd avoid having having both, since it's redundant.
I agree.
the line code for the escape character is forced to contain a trailing space (\ ) — in the source file
This bug has been fixed in version 2.0.0