message-format-wg
message-format-wg copied to clipboard
[FEEDBACK] Improving bidi layout
In MF1.0, we allow RLM and LRM in any position between syntax tokens that is not part of the literal message; those characters are then ignorable in parsing the message. The advantage of this is that many people will be viewing the message in plaintext editors, and the default results for the bidi algorithm is designed for normal text, and can really mess up "code-like" text (as in MF1.0 or MF2.0).
Here is an example of what happens with BIDI text. This is using a tool (mentioned below) that has the convention that UPPERCASE stands for bidi characters so that people can read them, and I used ⁅...⁆ is used in place of {...} because of the tooling.
Example Source: * ⁅⁅YOU HAVE ⁅$count⁆ NOTIFICATION.⁆⁆
Resulting Display: ⁅⁅.NOITACIFITON ⁅count :number$⁆ EVAH UOY⁆⁆ *
Notice that "YOU ARE" is reversed & NOTIFICATION is reversed (which is the correct order among the pieces), but the $ is jumbled.
but if the line started with 'one' instead of *, you'd get different results.
one ⁅⁅EVAH UOY ⁅$count :number⁆ NOITACIFITON.⁆⁆
If the syntax allows the insertion of an LRM in locations that are not part of the literal message and ignores those LRM, the source can be consistently left-to-right (so that the syntax is in the right order).
<LRM>* ⁅⁅YOU HAVE ⁅<LRM>$count :number⁆ NOTIFICATION.⁆⁆
Resulting Display:
* ⁅⁅EVAH UOY ⁅$count :number⁆ NOITACIFITON.⁆⁆
That is still not good for BIDI languages: the ideal would be that the flow of the syntax was LTR, but the flow of the variant message would be consistently RTL, something like:
<RLM>* ⁅⁅YOU HAVE ⁅<RLM><LRM>$count :number⁆ NOTIFICATION.⁆⁆
Resulting Display: ⁅⁅.NOITACIFITON ⁅$count :number⁆ EVAH UOY⁆⁆ *
A good "MF2" editor or localization tool would use something more sophisticated (we should call that to people's attention in the spec, pointing to https://www.unicode.org/reports/tr55, especially https://www.unicode.org/reports/tr55/#Ordering for applying HL4).
But a lot of people will be using plain-text editors, and allowing the entry of LRM and RLM can make the syntax more readable. (And tooling can insert those characters in the appropriate places automatically, so that plain text looks better.)
The tool is: https://util.unicode.org/UnicodeJsps/bidi.jsp?a=one+%E2%81%85%E2%81%85YOU+HAVE+%E2%81%85%24count%E2%81%86+NOTIFICATION.%E2%81%86%E2%81%86&p=Auto&hack=on
Notes:
- What it does not show is that the brackets ⁅...⁆ are mirrored if displayed R to L, so I did that manually above.
- If you insert either x or X in the Source (and ignore them in the Reordered output) you can see what the ordering result would be.