tyro icon indicating copy to clipboard operation
tyro copied to clipboard

[Advisory] Windows: Git Bash (`mintty`): Colorama chokes on Unicode box drawing characters used by Tyro for `--help` displays.

Open emcd opened this issue 1 year ago • 2 comments

(I did not see any Windows-specific caveats or advisories in the existing documentation, so I am posting something I found here in case you want to incorporate it.)

For people not using Windows Subsystem for Linux (WSL), Git Bash remains a popular alternative for a Unix-like experience on Windows. It is also the shell that Github Actions workflows use on Windows runners when shell: bash is used for a workflow step. The underlying terminal (mintty) expects output characters in the Windows Code Page 1252 encoding rather than the UTF-8 encoding. As Tyro utilizes rich to render the --help displays, Unicode box drawing characters are used and their Unicode code points do not translate to valid CP 1252 characters. This results in tracebacks like the following:

$ .auxiliary/artifacts/pyinstaller/mimeogram.exe --help
Program terminated from uncaught exception. Please file a bug report.
Traceback (most recent call last):
  File "mimeogram\cli.py", line 101, in execute
  File "tyro\_cli.py", line 166, in cli
  File "tyro\_cli.py", line 446, in _cli_impl
  File "tyro\_argparse.py", line 1903, in parse_args
  File "tyro\_argparse.py", line 1936, in parse_known_args
  File "tyro\_argparse_formatter.py", line 532, in _parse_known_args
  File "tyro\_argparse_formatter.py", line 474, in consume_optional
  File "tyro\_argparse_formatter.py", line 383, in take_action
  File "tyro\_argparse.py", line 1151, in __call__
  File "tyro\_argparse.py", line 2643, in print_help
  File "tyro\_argparse_formatter.py", line 301, in _print_message
  File "colorama\ansitowin32.py", line 47, in write
  File "colorama\ansitowin32.py", line 177, in write
  File "colorama\ansitowin32.py", line 205, in write_and_convert
  File "colorama\ansitowin32.py", line 210, in write_plain_text
  File "encodings\cp1252.py", line 19, in encode
UnicodeEncodeError: 'charmap' codec can't encode characters in position 697-770: character maps to <undefined>

(Local reproduction via Git Bash above. Issue is also seen in Github Actions workflows, where it was first identified.)

This Colorama issue goes into more detail about what is happening and why. It also suggests a mitigation of using the PYTHONIOENCODING environment variable set to utf-8. (This did not help in my case, but I also am working with a PyInstaller-created standalone executable which could be interfering with the transmission of environment variables to the Python program that it wraps.)


On a terminal which properly supports UTF-8 and advertises such support, there is no traceback. I.e., one will see the following on Linux, macOS, and other modern Unix systems:

$ mimeogram --help
usage: mimeogram [-h] [OPTIONS] {create,apply,provide-prompt,version}

Mimeogram: hierarchical data exchange between humans and LLMs.

╭─ options ────────────────────────────────────────────────────────────────────╮
│ -h, --help              show this help message and exit                      │
│ --configfile {None}|STR                                                      │
│                         (default: None)                                      │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ application options ────────────────────────────────────────────────────────╮
│ Information about an application.                                            │
│ ──────────────────────────────────────────────────────────────────────────── │
│ --application.name STR  (default: mimeogram)                                 │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ inscription options ────────────────────────────────────────────────────────╮
│ Logging and debug printing behavior.                                         │
│ ──────────────────────────────────────────────────────────────────────────── │
│ --inscription.mode {null,pass,rich}                                          │
│                         (default: rich)                                      │
│ --inscription.level {None,debug,info,warn,error,critical}                    │
│                         (default: None)                                      │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ subcommands ────────────────────────────────────────────────────────────────╮
│ {create,apply,provide-prompt,version}                                        │
│     create              Creates mimeogram from filesystem locations or URLs. │
│     apply               Applies mimeogram to filesystem locations.           │
│     provide-prompt      Provides LLM prompt text for mimeogram format.       │
│     version             Prints version information.                          │
╰──────────────────────────────────────────────────────────────────────────────╯

Anyway, not claiming that this a Tyro bug or that Tyro should do anything to mitigate it. Just providing the information for documentation purposes.

emcd avatar Feb 20 '25 21:02 emcd

Thanks for documenting! I'm not a Windows user so wouldn't have run into this myself but it's good to know.

Seems we might as well add a workaround for it if possible. I should have much more time in a month or two and could try then.

brentyi avatar Feb 20 '25 22:02 brentyi

Sure thing. I'm not big into Windows either, but I try to support the platform since there a lot of people who use it. Ideally, the issue would be fixed upstream. If we have to workaround in Tyro, then it would involve detecting the terminal type and telling Rich to use ASCII mode to render boxes.

emcd avatar Feb 25 '25 03:02 emcd

terminal-detection.py

@brentyi : Claude made me a little terminal / charset encoding detection script (attached). When I run it under Git Bash on Windows, I see:

$ python terminal-detection.py
=== Terminal Detection Report ===

Platform: win32
OS: Windows 10

Environment Variables:
  TERM: xterm
  SHELL: C:\Program Files\Git\usr\bin\bash.exe
  MSYSTEM: MINGW64
  MSYS: <not set>
  MINTTY: <not set>
  SESSIONNAME: Console
  TERM_PROGRAM: mintty
  TERM_PROGRAM_VERSION: 3.6.3
  COLORTERM: <not set>
  SSH_TTY: <not set>

Encoding Information:
  Default encoding: utf-8
  Filesystem encoding: utf-8
  Locale encoding: cp1252
  Stdout encoding: cp1252
  Stderr encoding: cp1252

TTY Information:
  stdout.isatty(): False
  stderr.isatty(): False
  stdin.isatty(): False

Unicode Test:
Traceback (most recent call last):
  File "D:\src\terminal-detection.py", line 67, in detect_terminal_environment
    print(f"  Box characters: {test_chars}")
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\Python\Python313\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 18-24: character maps to <undefined>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\src\terminal-detection.py", line 102, in <module>
    detect_terminal_environment()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "D:\src\terminal-detection.py", line 70, in detect_terminal_environment
    print(f"  Unicode test: \u2717 Failed - {e}")
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\Python\Python313\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u2717' in position 16: character maps to <undefined>

(The script isn't perfect since it reports Unicode errors using a character which causes Unicode errors. 🤦‍♂️ But, Claude still did good overall.)

What this shows is that it is very easy to detect whether Tyro will be able to draw boxes (for --help) with the Unicode box drawing characters or not. Simply look at getattr(sys.stdout, 'encoding', 'unknown') and check if it starts with something other than utf-, like cp1252. If that is unknown, then you can also look at locale.getpreferredencoding() and whether the TERM_PROGRAM environment variable is mintty along with sys.platform is win32.

emcd avatar Sep 22 '25 01:09 emcd

This is fixed in the pre-release version: pip install --pre tyro.

brentyi avatar Nov 07 '25 10:11 brentyi

Awesome; thank you! I left a comment in the PR about how you could extend the matrix strategy in the pytest workflow to include windows-latest runners. In my projects, I always test on ubuntu-latest, macos-latest, and windows-latest. Basically, you can add a second dimension to the matrix strategy, which enumerates these runners, and then set the runs-on field to the appropriate matrix dimension.

emcd avatar Nov 07 '25 15:11 emcd