Multiple emoji encoding issues on Windows
When trying to use rich to print files with emoji on Windows, there are some encoding issues.
Below are 2 cases I encountered.
Garbled text instead of emoji
When running rich broken-emoji.md (broken-emoji.md - a text file with nothing but the 😊 emoji in it) on Windows (in Windows Terminal), I get the following:
😊
If I run Get-Content broken-emoji.md or run rich inside WSL, I get the emoji printed as expected.
Rich fails to print entirely
When running rich cannot-print.md (cannot-print.md - only contains the 🤝 emoji) on Windows, I get:
unable to read .\cannot-print.md: 'charmap' codec can't decode byte 0x9d in position 3: character maps to <undefined>
Running it in WSL or using Get-Content cannot-print.md in the same terminal window gives me the emoji as expected.
Expected Results
As this works in the same terminal both with Powershell's Get-Content, and when using WSL to run rich-cli, I'd expect it to work in Windows as well.
Environment
OS: Windows 10 (build 19044.1889) Terminal: Windows Terminal (version 1.14.2281.0) running PowerShell Rich CLI: 1.8.0 Python: 3.10.1
Update: This seems to be caused by the current codepage not being 65001 (UTF-8).
Setting $env:PYTHONUTF8=1 solves this.
I'm leaving this issue open for 2 reasons:
- Maybe there's a way around it, to make it work by default
- I assume more people encounter the same issue, and having it documented / printing a suggestion when the error occurs could be useful.
Ran into this also, the $env:PYTHONUTF8=1 trick worked for me, although it would be awesome if it could "just work" ™️. It might not be useful, but it looks like you can invoke python with python -X utf8 ... and it does the same trick @tmr232 showed above without modifying the environment.
Just verified now - setting the system locale to support UTF8 fixes it as well (as expected), but I'd still prefer rich to "just work" when using Windows Terminal.