Extend visualization of non-printables with clear and stable characters
Observation
macOS curiously renders the “Control Pictures" (U+2400) block characters differently depending on context.
- Preceded by
00-08 | 0E-1F | 7f,09-0Dwill render horizontally adjacent like other control characters - Preceded by any other character, they are rendered diagonally (NW to SE) – and much less legible to boot
ruby -e '((0..9).to_a + [127] + (11..31).to_a).each.with_index { |c, i| printf "%02X%s \x00%s ", c, c.chr, c.chr; puts if i & 7 == 7 }' | bat --color=never -A --tabs 1 | sed 's/·/ /g'
Proposals
1. Replace unstable Unicode characters
Use characters that don't exhibit confusing contextual behavior, and are also more legible and descriptive.
09 HT ⇥ 21E5 "Rightwards Arrow to Bar"
0A LF ⏎ 23CE "Return Symbol"
0B VT ⤓ 2913 "Downwards Arrow to Bar"
0C FF ↡ 21A1 "Downwards Two Headed Arrow"
0D CR ← 2190 "Leftwards Arrow"
Note: HT already has some logic around ↹ and ├─┤ that would have to be optioned or reconsidered.
2. Display intuitive Unicode characters for other still-in-use characters
Albeit unaffected by the diagonal issue, these characters will also benefit from a clear iconography:
08 BS ⌫ 232B "Erase to the Left"
1B ESC ⎋ 238B "Escape Symbol"
7F DEL ⌦ 2326 "Delete Symbol"
3. Extend visualization options
Provided:
- Today only
LF,CR,HTandESCare ever really used as originally intended - Alternative meanings may apply (eg.
\x1Aterminates a HayesATSMS) - Triple abbrev Unicode characters are very slim and hard to read
- They might even overlap the next char (depending on font rendering)
- Binary viewers like
xxddisplay all non-printables as a dot.- (Preventing clutter for a user most likely scanning for human-readable text)
Proposal (including a direly needed -c shorthand):
-c, --nonprintable-notation <notation>
p, period Show as period
c, caret Show as caret notation (^@, ^A ... ^_)
u, unicode Show as Unicode (block U+2400)
b, binary Show relevant as Unicode, legacy as period
d, default Show relevant as Unicode, legacy as default
Other aspects
Leave a comment if you know anything about ...
- Insights or theories on why macOS alters its font rendering
- Corresponding behavior in other Operating Systems
References
- https://en.wikipedia.org/wiki/ASCII
- https://en.wikipedia.org/wiki/C0_and_C1_control_codes
- https://www.compart.com/en/unicode/block/U+2400
@keith-hall: Hello! Would you comment on these observations and proposals?
macOS curiously renders the “Control Pictures" (
U+2400) block characters differently depending on context.
Is it really a Mac thing, or just something like font ligatures, and switching the terminal emulator to a different font would prevent the contextual behavior?
I have no problem with introducing a shorthand argument. I think presenting a choice of notation for non printable characters could complicate things a bit due to how the syntax highlighting works, but if someone is willing to do the work, I'd be happy to review the PR to see how it is in practice and whether we want it in bat 🙂
Did some testing and the diagonal effect happens primarily with the default "Fixed Width" fonts in macOS (and for instance not with Helvetica), though it seems that none of these fonts actually have their own glyphs for control characters, so this is probably a fallback font, and not sure which font this is. Any idea?=
Good comments otherwise. Will post here if I can muster up something useful.