lc-shell
lc-shell copied to clipboard
Smart/Curly braces in Working with free text (Option 1 - Challenge)
The challenge in Option 1 in Working with free text episode states that ‘smart’ or ‘curly’ quotes have not been removed from the text file in the previous steps. However, when I worked through these steps, these punctuation marks had indeed been removed. Perhaps this challenge should be updated?
I tried this both using bash-3.2 as well as zsh-5.8.1 with Mac osX Ventura 13.1 with the same results.
This is a tr thing, not directly a bash or zsh thing.
Curly quotes are two-byte characters in UTF-8 (U+2018, U+2019). Most implementations of tr, including GNU coreutils used in Linux and the various Bash options for Windows, can only handle single-byte characters (e.g. ASCII, UTF-8 up to U+00FF), so don't recognise the curly quote characters.
Macs use the BSD version of tr. I cannot find any documentation of this anywhere, but it looks like recent versions of BSD tr can handle multi-byte characters and therefore recognise curly quotes as belonging to the [:punct:] class. (Best I can find at the moment is a presentation from 2016 that implies the change happened after that time.)
It sounds like the OpenSolaris/Heirloom Toolchest version of tr has had multi-byte support for much longer.