bat icon indicating copy to clipboard operation
bat copied to clipboard

Extend visualization of non-printables with clear and stable characters

Open forthrin opened this issue 4 months ago • 3 comments

Observation

macOS curiously renders the “Control Pictures" (U+2400) block characters differently depending on context.

  • Preceded by 00-08 | 0E-1F | 7f, 09-0D will render horizontally adjacent like other control characters
  • Preceded by any other character, they are rendered diagonally (NW to SE) – and much less legible to boot
ruby -e '((0..9).to_a + [127] + (11..31).to_a).each.with_index { |c, i| printf "%02X%s \x00%s ", c, c.chr, c.chr; puts if i & 7 == 7 }' | bat --color=never -A --tabs 1 | sed 's/·/ /g'
Rendering of Unicode block U+2400

Proposals

1. Replace unstable Unicode characters

Use characters that don't exhibit confusing contextual behavior, and are also more legible and descriptive.

09 HT  ⇥ 21E5 "Rightwards Arrow to Bar"
0A LF  ⏎ 23CE "Return Symbol"
0B VT  ⤓ 2913 "Downwards Arrow to Bar"
0C FF  ↡ 21A1 "Downwards Two Headed Arrow"
0D CR  ← 2190 "Leftwards Arrow"

Note: HT already has some logic around and ├─┤ that would have to be optioned or reconsidered.

2. Display intuitive Unicode characters for other still-in-use characters

Albeit unaffected by the diagonal issue, these characters will also benefit from a clear iconography:

08 BS  ⌫ 232B "Erase to the Left"
1B ESC ⎋ 238B "Escape Symbol"
7F DEL ⌦ 2326 "Delete Symbol"

3. Extend visualization options

Provided:

  • Today only LF, CR, HT and ESC are ever really used as originally intended
  • Alternative meanings may apply (eg. \x1A terminates a Hayes AT SMS)
  • Triple abbrev Unicode characters are very slim and hard to read
    • They might even overlap the next char (depending on font rendering)
  • Binary viewers like xxd display all non-printables as a dot .
    • (Preventing clutter for a user most likely scanning for human-readable text)

Proposal (including a direly needed -c shorthand):

      -c, --nonprintable-notation <notation>
              p, period   Show as period
              c, caret    Show as caret notation (^@, ^A ... ^_)
              u, unicode  Show as Unicode (block U+2400)
              b, binary   Show relevant as Unicode, legacy as period
              d, default  Show relevant as Unicode, legacy as default

Other aspects

Leave a comment if you know anything about ...

  • Insights or theories on why macOS alters its font rendering
  • Corresponding behavior in other Operating Systems

References

  • https://en.wikipedia.org/wiki/ASCII
  • https://en.wikipedia.org/wiki/C0_and_C1_control_codes
  • https://www.compart.com/en/unicode/block/U+2400

forthrin avatar Oct 07 '25 11:10 forthrin

@keith-hall: Hello! Would you comment on these observations and proposals?

forthrin avatar Nov 01 '25 18:11 forthrin

macOS curiously renders the “Control Pictures" (U+2400) block characters differently depending on context.

Is it really a Mac thing, or just something like font ligatures, and switching the terminal emulator to a different font would prevent the contextual behavior?

I have no problem with introducing a shorthand argument. I think presenting a choice of notation for non printable characters could complicate things a bit due to how the syntax highlighting works, but if someone is willing to do the work, I'd be happy to review the PR to see how it is in practice and whether we want it in bat 🙂

keith-hall avatar Nov 01 '25 19:11 keith-hall

Did some testing and the diagonal effect happens primarily with the default "Fixed Width" fonts in macOS (and for instance not with Helvetica), though it seems that none of these fonts actually have their own glyphs for control characters, so this is probably a fallback font, and not sure which font this is. Any idea?=

Good comments otherwise. Will post here if I can muster up something useful.

forthrin avatar Nov 01 '25 20:11 forthrin