pml-companion icon indicating copy to clipboard operation
pml-companion copied to clipboard

New Nodes to Handle Edge Cases: word-joiner and blank/empty

Open tajmone opened this issue 4 years ago • 2 comments

I've noticed that in the PML User Manual, section Anatomy of a PML Document » Attributes, the line code for the escape character is forced to contain a trailing space (\ ) — in the source file 05_anatomy.pml:

must be terminated by a backslash ([c \\ ]),

The problem here is that using [c \\] instead of [c \\ ] won't work because it would be parsed as [c+\+\], i.e. the second slash is being interpreted as escaping the closing bracket.

To avoid similar problems (which are typical edge cases found on all lightweight syntaxes) I suggest adding some extra special characters:

  • [empty or [blank — replaced by nothing (empty string), post-parsing. It's sole role is to feed a token separator to the parser.
  • [wjword-joiner character (⁠); a code point in Unicode that prevents a line break at its position.

(obviously, no closing bracket required for either)

The above example from the PML User Manual could then be fixed via:

must be terminated by a backslash ([c \\[empty]),

Both of these are useful hacks to handle edge-cases where the PML parser could be faced with ambiguities like the above example, and they would be the equivalents of Asciidoctor's predefined characters-substitutions attributes {empty}/{blank} and {wj}, which are extremely useful to handle all sort of edge-cases in AsciiDoc sources.

In Asciidoctor, {empty} and {blank} are identical, one is just an alias of the other; I personally prefer [empty to [blank, for I believe it's clearer, and I'd avoid having having both, since it's redundant.

The [wj is also very useful in situations where you need to prevent the browser from wrapping a table column during auto-adjustment (e.g. because one column contains words separated by boundaries like spaces, hyphens, brackets, etc.). Or to prevent wrapping a line between a word and its footnote marker, e.g. someword[1]someword+\n+[1], whereas someword[wj[1]

and sometimes they can just improve source readability

These would be consistent with the current [nl and [sp substitutions available in PML.

References

tajmone avatar Mar 25 '21 14:03 tajmone

the line code for the escape character is forced to contain a trailing space

Well spotted!

The reason is that the current parser uses a regex that does not consider this edge-case. The new pXML parser (which only reads a sequence of characters (no regexes)) will parse [c \\] correctly as a node c with content \.

However, it's a very good idea to add 'word_joiner' and 'empty' nodes. They can help to explicitly eliminate ambiguities like this, and they are useful in other cases as well, as you mentioned. Will be done. Easy to implement.

I personally prefer [empty to [blank, for I believe it's clearer, and I'd avoid having having both, since it's redundant.

I agree.

pml-lang avatar Mar 26 '21 08:03 pml-lang

the line code for the escape character is forced to contain a trailing space (\ ) — in the source file

This bug has been fixed in version 2.0.0

pml-lang avatar Sep 09 '21 03:09 pml-lang