grass
grass copied to clipboard
[Bug] UnicodeDecodeError from v.pack or potentially other scripts that call read_command()
Describe the bug I think we need a long-term solution, but for now, I'll try to explain my issues with Korean characters.
Not suprisingly, Microsoft chose to use their own proprietary charset "CP939" (a variant of EUC-KR) for Korean by default. OK, that's fine. GRASS's default charset for Korean is euc-kr (line 1415 in grass79.py). BUT, SQLite only supports UTF-8, so I have to choose either translated Korean messages or correct outputs from v.db.select, etc. by switching the codepage to "CP65501" (UTF-8) and setting OUTPUT_CHARSET=CP65001 for gettext encoding in etc/env.bat. When in CP949, aligned printng by G_*aprintf() doesn't work well, but I can read translated messages.
But this is a small inconvenience compared to the output of v.db.select because v.db.select prints UTF-8 characters into an EUC-KR terminal.
Yes, the underlined characters are broken (UTF-8 characters treated as EUC-KR). Other than alignment and SQLite outputs, everything else seems fine including the v.pack issue that I explained below.
For the above reason, I chose to use CP65001 as my default charset for GRASS (giving up text file compatibility with Windows). Now, v.db.select works great and v.info output is very clean.

However, Python still uses CP949 as default (I believe?) and many GRASS Python scripts that invoke read_command() do not provide a means for passing encoding='utf-8' to this function. In other words, many scripts try to print EUC-KR characters into a UTF-8 console, causing my reported issue. An easy but annoying and short-sighted fix is to add encoding='utf-8' to every single read_command() call (e.g., v.pack in my screenshot below).
Anyway, my request is to add the ability to pass a desired encoding to any functions that use read_command() or other functions that may output translated messages.
To Reproduce Steps to reproduce the behavior:
- Change the locale setting of MS Windows to Korean
- Start GRASS
set OUTPUT_CHARSET=CP65001cpch 65001v.pack any_vector
Expected behavior No errors.
Screenshots

I added encoding='utf-8' to line 199 in etc/python/grass/script/vector.py to fix this issue.
System description (please complete the following information):
- Operating System: Windows
- GRASS GIS version: master
- Codepage: CP65001
set OUTPUT_CHARSET=CP65001
Additional context
For our records, I have tried set PYTHONIOENCODING=utf8 and/or set PYTHONLEGACYWINDOWSSTDIO=yes (https://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING) to no avail.
GRASS's default charset for Korean is
euc-kr(line 1415 in grass79.py).
Why is that? Why not UTF-8?
An easy but annoying and short-sighted fix is to add
encoding='utf-8'to every singleread_command()call
If we are looking for a quick fix anyway, isn't putting encoding='utf-8' somewhere deep into read_command() implementation a better route?
GRASS's default charset for Korean is
euc-kr(line 1415 in grass79.py).Why is that? Why not UTF-8?
For file contents compatibility with other programs.
An easy but annoying and short-sighted fix is to add
encoding='utf-8'to every singleread_command()callIf we are looking for a quick fix anyway, isn't putting
encoding='utf-8'somewhere deep intoread_command()implementation a better route?
Yes, that would be easier if other locale users are OK with UTF-8. Maybe, I should forget about EUC-KR (CP949) and move to UTF-8 (CP65001) for Korean, well if that's possible and supported by GRASS (e.g., read_command()).
See also: https://trac.osgeo.org/grass/ticket/3220