refterm
refterm copied to clipboard
What should happen for RTL lines?
Right now, the terminal just has LTR mode and lays out RTL subsections inside LTR lines.
It would be straightforward to add fully RTL lines, but the question is, what should the behavior be?
One way is to simply have the terminal be set to a mode, LTR or RTL, and all lines are that way with opposite-way sections laid out inside them.
Another way is for the terminal to try to "guess" what each individual line is based on how much LTR vs. RTL there is in the line, and pick the starting point that way.
A further wrinkle here is, does RTL mode switch how VT-codes are interpreted? Eg., does a VT-code goto with X position 1 go to the first cell on the right (instead of the left) if the line/mode is RTL?
I would welcome opinions on these issues because I don't think there is a "right" answer here, so the best thing to do in refterm as an example is to do something that seems like the consensus view, I think.
- Casey
One thing I can do is show how other applications display arabic/english mixed content:
- This is an example file: arabic.txt
- In Emacs lines starting in arabic will be aligned to right only if it has an empty line before it. other wise it uses previous line alignment.
- When I
cat
the file in Konsole it shows all lines aligned to left - Firefox will align lines starting in arabic characters to right even if it continues in english
- Chrome will display all lines aligned to left
Side note:
- When I use terminal emulators I use it in english/default settings I don't switch it to RTL because all input/output is in english unless some of the output is in arabic it is expected to be rendered in a readable way (the alignment is higher than this expectation). Also this is what all the arabic speaking developers I know does.
I think the current treatment is fine. No one expects terminal output to be right-aligned .. unless the whole terminal was.
There is this document that makes some sensible suggestions: https://terminal-wg.pages.freedesktop.org/bidi/
TLDR: there is BIDI mode attached to console, which can be set up by user and app, like codepage. Depending on mode, terminal does some combination of:
- Guessing direction of line/part of line (probably based on first characters)
- Laying out text in LTR or RTL mode
- Ignore or take into account BIDI control characters
I didn't think too much about it, but it seemed to cover most use cases reasonably well, either automatically, or with little user input (setting desired mode for non-aware software).
A further wrinkle here is, does RTL mode switch how VT-codes are interpreted? Eg., does a VT-code goto with X position 1 go to the first cell on the right (instead of the left) if the line/mode is RTL?
My instinct would be yes, but then again the typical scenario would be "we are in RTL mode since the most recent newline", so maybe this is a reason for emacs' heuristic.
In case it is useful, here is Knuth's take (at some point) https://tug.org/TUGboat/tb08-1/tb17knutmix.pdf
Another detail is line wrapping of RTL text. In the refterm v2 video, when the window is shrunk, RTL text is wrapped using the same approach as LTR: fit as many glyphs as possible on a line starting from the left, and push the rest to the next (lower) line. I'm assuming the correct behavior is to keep the rightmost part of the text on the first line.
Wondering if this would help: https://unicode.org/reports/tr9/ http://site.icu-project.org/
Looks like they take care of all the heavy lifting of how to properly format any language on the screen and convert the glyphs to their arabic substitution unicodes if necessary.