Help messages are not displayed correctly
Hello.
This is my first time using gettext.
I encountered what appears to be a bug in the Windows Terminal environment on Japanese Windows 11.
I'm using a version that doesn't exhibit the bug, so I'm not in a rush to report it here.
-
Execution environment
- OS: Japanese Windows 11 Pro for Workstations
- This is not a multilingual version of Windows
- The default encoding for Japanese Windows is Shift_jis (iso-2022-jp).
- OS: Japanese Windows 11 Pro for Workstations
-
Version: 25H2 (build#. 26220.5770)
-
Execution terminal
- Windows Terminal:
- PowerShell 7.5.2
- cmd.exe, Version 10.0.26220.5770
- Windows Terminal:
-
Command executed
- msgfmt --help
- The same applies to other commands with "--help".
-
Issue
- The help message is garbled.
-
Versions that do not exhibit the issue:
- gettext 0.21 + iconv 1.16 (released on August 12, 2020)
-
Versions that exhibit garbled characters:
- gettext0.26-iconv1.17-static-64
- gettext0.25.1-iconv1.17-static-64
- gettext0.25-iconv1.17-static-64
- gettext0.23-iconv1.17-static-64
- gettext0.22.5a-iconv1.17-static-64
- Packages other than those listed above have not been tested.
Other things I've tried:
- In PowerShell, set $OutputEncoding to the encoding specified in *2 and display the help message.
- In cmd.exe, set the chcp command to the encoding specified in *2 and display the help message.
- *2 Encoding:
- Shift_JIS (932) - Default for Japanese Windows
- UTF-8 (65001)
- EUC-JP (51932)
- Latin-1 (850)
- ASCII (20127)
- *2 Encoding:
- Environment variables LANG=utf8.ja_JP, LC_ALL=utf8.ja_JP
- Note that the file output by "gettext --help > help.txt" displays correctly in Shift-JIS.
I'm not the right person to ask for this kind of technical support: I simply compile gettext for Windows following the authors' instructions.
I think @bhaible would be the right person, or you can file a bug at https://savannah.gnu.org/bugs/?group=gettext or write to the bug-gettext mailing list.
Let's continue to discuss it here, for simplicity.
To understand what's going on, please attach three things:
- The file
help.txtproduced bygettext --help > help.txt, - A screenshot of running
gettext --helpin a cmd.exe window, - A screenshot of running
gettext --helpin a PowerShell window.
Thanks for the reply bhaible. The requested result is as follows:
- The file generated by ”gettext --help > help.txt” : help.txt
- The above help.txt will be displayed correctly if opened with the 'shift_jis' encoding.
Screenshot opened in Visual Studio Code::
- The above help.txt will be displayed correctly if opened with the 'shift_jis' encoding.
Screenshot opened in Visual Studio Code::
-
A screenshot of running gettext --help in a cmd.exe window:
-
A screenshot of running gettext --help in a PowerShell window:
Your cmd.exe screenshot looks like CP932 output, displayed in a console that assumes the an 8-bit encoding consisting of ISO646-US (= US-ASCII) and Katakana.
Your PowerShell screenshot looks like CP932 output, displayed in a console that assumes the JISX0201-1976 encoding (consisting of ISO646-JP and Katakana).
These two encodings are not useful for Japanese, since they cannot display Hiragana nor Hanzi characters.
Therefore this is what you need to change: the encoding used by cmd.exe and the encoding used by PowerShell. Set both to CP932, and you're done.
In other words, you need to set the code page to 932, not 201.
As I mentioned at the beginning, I am not using a multilingual version of Windows, but the native Japanese version of Windows 11 distributed for Japan. In this edition, the default encoding is Japanese (shift_jis) and the code page is 932, but I have not changed it at all.
-
Continuing from the previous cmd.exe window, here's the result of checking the code page:
-
Below is a screenshot of the code page check and "gettext --help" in a PowerShell window:
-
Below is a different approach. I have an older version in a different folder, and it displays Japanese correctly. The code page has not been changed at all during this time. In other words, even though the code page is the same, the older version displays correctly, while the latest version displays garbled text.
When I asked "google gemini," they said, "Perhaps the libraries or something have been changed in the new version, causing it to fail to process Japanese."
I'm Japanese and have only checked the Japanese version, but China and Korea also use multi-byte code, so the same problem may be occurring in those countries. However, people in China and Korea take pride in being able to read and write English, so it's possible that they don't use a version of Windows in their native language. Japanese people don't have such preferences, so most Japanese people probably use the version of Windows in their native language.
If possible, I would like to be able to use gettext just by obtaining the binary code, without having to prepare the recompilation environment myself.
So for now I'm going to stick with an older version that works fine and hope someone will fix it eventually.
I admit that your last round of screenshots is puzzling: One program's output (CP932) is displayed correctly, another program's output (also CP932) is displayed as if it were JISX0201.
@mlocati: Can you please look for differences between gettext.exe in one package and gettext.exe in the other package? Possible differences I can think of:
- Imported libraries (
dumpbin /importsor equivalent), - The meta information in the Properties window of the Windows file manager.
I'd like to report on my current progress.
The source structure of each package (gettext-0.21 or gettext-0.22.5) is as follows:
| directory | issue |
|---|---|
| 1. src/gettext-runtime/po | gettext po/gmo resources |
| 2. src/gettext-runtime/gnu-lib | gettext libraries |
| 3. src/gettext-tools/runtime/po | po/gmo resources for other commands such as msgfmt |
| 4. src/gettext-tools/gnu-lib | libraries for other commands such as msgfmt |
Multibyte handling in printf() functions called by commands such as gettext and msgfmt is written using different codes for gettext and non-gettext commands such as msgfmt, i.e. gettext and other commands are independent.
Of the above, regarding the po files, while all other po files are written in utf-8, only the Japanese resource file, ja.po, is written in euc-jp. This may be the cause. However, when running "msgfmt --help", the display is correct in gettext-0.21-iconv-1.16, but garbled in gettext-0.22.5a-iconv-1.17. At this point, I can only say that these differences have nothing to do with the ja.po file. However, the content displayed is probably the content of these jo.po files. It may be that the older package only reads Japanese in euc-jp, or that this processing has been eliminated in the newer package.
That's all, and I'll report back to you.
Ah, I forgot to add some additional information.
gettext-0.21-iconv-1.16, which displays the help message correctly, does not include the gettext.exe binary. This may be because it was simply not included in the build.
On the other hand, the package where the help message is garbled (gettext-0.22.5a-iconv-1.17 and later) does include a gettext.exe binary image.
These two packages (gettext-0.21-iconv-1.16 and gettext-0.22.5a-iconv-1.17) do not contain source code, but rather configuration information and patches required for building.
gettext-0.21-iconv-1.16 included numerous patches and .vbs and .sh files. On the other hand, gettext-0.22.5a-iconv-1.17 and later do not include any patches, and instead include .ps1 (powershell) scripts. Although I haven't examined the details in detail, there are significant differences in the build environment to begin with.
Are these .vbs, .sh, and .ps1 tools created by mlocati ?
That's all, and I'll report back to you.
thank you.
This "gettext-iconv-windows" repository doesn't contain the gettext source code: here we have "just" some scripts that I wrote to build gettext and iconv for Windows.
Those scripts are invoked by the build.yml GitHub Action - See here for the executions of that action.
Is it possible to build on github using AWS? That's amazing! I didn't know that.
It's been eight years since I retired from the IT industry. The world has changed so fundamentally without me noticing! It's hard for an old man like me to understand.
Thank you for your help. After some trial and error, this problem was solved.
I will explain what happened. First, navigate to the bin directory of the problematic package from pwsh.
cd cd $env:USERPROFILE\Downloads\gettext0.26-iconv1.17-static-64\bin
Next, run the following command:
.\gettext.exe --help | Out-File -FilePath help.txt
The output to help.txt is in utf-8.
In other words, we can see that gettext --help is displaying the help message in utf-8.
Therefore, if utf-8 could be displayed on Windows, there would be no problem, but the default system settings for Japanese Windows only allow the shift_jis and euc-jp code pages, and do not support utf-8. This means that the following command is not available by default:
chcp 65001
Therefore, in order to make utf-8 usable in the Windows shell (powershell or cmd.exe), we will use a Windows beta feature.
First, Go to "Time and Language," "Language and Region," and "Administrative Language Settings" in Windows Settings.
Next, Click the "Change system locale" button on the "Administrative" tab.
"Japanese (Japan)" will be displayed under "Current system locale."
Under that item, there is an option that says "Beta: Use Unicode UTF-8 for worldwide language support (U)", so check this option and restart the system.
With this setting, as long as UTF-8 is output to standard output, kanji will be displayed correctly without having to switch the chcp command.
Below is the result.
Thank you for all the advice above. Thank you everyone.
Thanks @maznobu for your investigations. It's good to hear that for you, turning on the "Beta" UTF-8 mode of your Windows installation is a workaround. But I want to understand the cause and find a fix.
- I reproduce the issue, simply by switching my Windows 10 installation to Japanese (and rebooting, of course). To understand the dialogs, I let
translate.google.comtranslate screenshots for me.
I'm focusing on cmd.exe (since it's less complex than PowerShell).
With the gettext0.21-iconv1.16-static-64 binaries, the output of msgfmt --help looks like this (OK):
With the gettext0.26-iconv1.17-static-64 binaries, the output of msgfmt --help looks like this (BUG):
Note in particular the "write error" message in the last line. This comes from the program. Therefore it proves that the cause lies in the program, not in the console.
- I compared the
dumpbin /importsoutput of the two programs:
Both use msvcrt.dll. This proves that Microsoft's ucrt is not the cause.
-
I compared the meta information of the two msgfmt.exe programs, in Windows explorer. Aside from a signing from "SignPath Foundation", I could not see a relevant difference.
-
I asked ChatGPT: "A C program running on Windows (that uses printf) produces correct output when compiled with an older version of MSVCRT, but with a newer version of MSVCRT it somehow converts the output, with the effects that 1) the cmd.exe does not display it correctly, 2) the error indicator on the stdout stream gets set. What can be the cause?"
The answer sounds plausible, but — as usual when one asks ChatGPT a very special question — it's a hallucination: it doesn't hold up to factual verification.
- I compiled
gettext-0.26/gettext-runtimein my usual Cygwin environment, once with mingw 5.0, once with MSVC 14, both with options--enable-relocatable --disable-shared. Then, when in the cmd.exe console window, I setset LANG=Japanese_Japan.932and run.\gettext.exe --help:
- With the mingw 5.0 binary, I see the BUG.
- With the MSVC binary, it's OK.
So, I'm now focusing on differences between the runtime libraries (mingw + msvcrt vs. UCRT). More to come...
when running "msgfmt --help", the display is correct in gettext-0.21-iconv-1.16, but garbled in gettext-0.22.5a-iconv-1.17.
I confirm:
- gettext-0.21-iconv-1.16 is OK,
- gettext-0.22.5a-iconv-1.17 (r1) and subsequent releases all show the BUG.
@mlocati : What were the differences regarding the use of mingw between these two releases that you built?
- version of mingw?
- use of
__USE_MINGW_ANSI_STDIO?
I'm very sorry. I haven't done any builds and I don't have a build environment.
What I did was:
- Download the binary packages for each version provided by mlocati.
- Extract the files to the Downloads folder in my Windows personal profile.
- Launch PowerShell by selecting "Open Windows Terminal" from the Explorer context menu.
- From the launched PowerShell, launch the command interpreter with "cmd".
- Navigate to the extracted bin directory.
- Run "msgfmt --help".
- Repeat steps 3-6 for each version and check the results.
P.S. While searching for UTF-8 extensions in Windows, I analyzed the source code and found that the printf() function, called by gettext and msgfmt, calls functions defined in gnu-lib within the same project. (Since gettext and msgfmt are independent, they appear to have their own gnu-lib libraries.) Internally, the function uses the Windows API function _wsetlocale to obtain locale information through multiple call levels. If locale information cannot be obtained, the function obtains the setting of the LC_ALL or LANG environment variable, separates the character set and locale with a period, and obtains the result.
I suspect that somewhere, if the character set does not exist or if the format returned by _wsetlocale is misinterpreted, it is internally fixed to "utf-8."
Is the relevant part somewhere around ctype_codeset()?
@maznobu No need to be sorry. I am very grateful that you reported this issue, since it likely has a large impact. No one expects you to investigate the issue; that is what I (as the GNU gettext maintainer) and @mlocati need to do.
No, I understand that very well.
The first answer was:
This means you need to set the code page to 932 instead of 201.
Despite initially explaining the situation in detail, this response made me feel slighted. So I decided to check the source code myself.
It's not the best analogy, but there are many cases, like Windows KB5063878, where many people are experiencing bugs but the manufacturer refuses to acknowledge them. In the case of KB5063878, the environment tested by the manufacturer is running the latest firmware, and the issue does not occur. However, the issues users face appear to occur on HDDs/SSDs that still have older, non-latest firmware written on them. It appears that manufacturers are not paying any attention to this issue.
So I'm truly grateful to everyone who has taken my concerns seriously in forums like this.
Returning to the topic at hand, once you've used the UTF-8 extensions option in the Windows, it seems that turning the setting off doesn't completely revert to its original state.
Even with UTF-8 extensions turned off, the 65001 code page setting remains. Previously, using "chcp 65001" would result in an error, but it can now be switched to.
Currently, UTF-8 extensions are turned off, and the results are clearly different from when I started this report, as shown below.
- Execution of "msgfmt --help" after executing "chcp 932" in PowerShell Japanese characters are displayed correctly, with no garbled characters. Previously, garbled characters would appear as in 3.
- Execution of "msgfmt --help" after executing "chcp 65001" in PowerShell Character garbling is different from before.
- Execution of "msgfmt --help" after executing "chcp 932" in cmd.exe The same garbled characters occur as before.
- Execution of "msgfmt --help" after executing "chcp 65001" in cmd.exe The English help message will be displayed.
That concludes my report on the current situation. Thank you for your continued support.
Thank you.
The English help message will be displayed.
The LANG (or LC_ALL) variable matters for whether the translation can be found. To be on the safe side:
- Use
set LANG=Japanese_Japan.932in environments with code page 932, - Use
set LANG=Japanese_Japan.65001in environments with code page 65001.
@mlocati : What were the differences regarding the use of mingw between these two releases that you built?
The Windows binaries for gettext up to 0.21 were built using a Docker image with:
-
ubuntu:18.04base image - the default build tools (g++, ...) provided by Ubuntu
- cross-compilation with mxe
-
-D__USE_MINGW_ANSI_STDIO=0
(see the "setup" script and the "compile" script)
For the newer gettext versions I switched to the official cygwin approach, without specifying -D__USE_MINGW_ANSI_STDIO=0 because gettext should already do that (see commit 45500ab1765581d6a3b7d2e6a6c2595466de70af).
For the newer gettext versions I switched to the official cygwin approach, without specifying
-D__USE_MINGW_ANSI_STDIO=0because gettext should already do that (see commit45500ab1765581d6a3b7d2e6a6c2595466de70af).
This commit disables the mingw stdio functions only in the three libraries. The rest of the binaries are built with __USE_MINGW_ANSI_STDIO being 1, due to gnulib/m4/stdio_h.m4.
Let me see whether this flag is relevant for the issue...
Here is a small reproducer, independent of Gnulib.
I constructed this program by starting with gettext-0.26/gettext-runtime/src/gettext.c,
replacing the gettext invocations with the string literals from the Japanese localization (in CP932 encoding),
reducing the use of Gnulib step by step,
and finally replacing two printf invocations with fputs invocations.
This program, when compiled with mingw 13 / msvcrt, and run in a Windows 10 system set to Japanese, in a cmd.exe window with chcp 932 and set LANG=Japanese_Japan.932, exhibits the following behaviour:
- When compiled with
-D__USE_MINGW_ANSI_STDIO=0, the output is correct: it uses double-width characters consistently. - When compiled with
-D__USE_MINGW_ANSI_STDIO=1, the output of thefputsinvocations (lines 1, 2, 6) is correct, whereas the output of theprintfinvocations is buggy (looks like interpreted in JISX0201 encoding):
Here's an explanation of the bug:
Windows consoles come with two encodings: GetACP() and GetOEMCP(). For Japanese, both have the same value (932). However, for English, German, French Windows installations, GETACP() = 1252 and GetOEMCP() = 850. For many years, output of non-ASCII characters to consoles was a PITA: While the program had to produce output in GetACP() encoding when writing to files, it had to produce output in GetOEMCP() encoding when writing to a console. The majority of programs did not do this: they produced output in GetACP() encoding always, and thus non-ASCII characters got garbled in consoles.
After many many years, Microsoft finally added a workaround in the C runtime library (msvcrt and ucrt). When a program writes a string to a console, the runtime library tests whether the output goes to a console, and if yes, it does a conversion from GetACP() encoding to GetOEMCP() encoding on the fly, in two steps: from GetACP() to UTF-16 via MultiByteToWideChar, then to GetOEMCP() via WideCharToMultiByte.
In the ucrt library, this conversion can be found in ucrt-10.0.22621.0/lowio/write.cpp. Look at the functions
_write_nolock
write_requires_double_translation_nolock
write_double_translated_ansi_nolock
In the msvcrt library, a similar conversion takes place. This library is closed-source, but I spotted similar calls to MultiByteToWideChar and WideCharToMultiByte while single-stepping in the debugger. The BUG is here, in the msvcrt library, when the encoding is a double-byte encoding and the program produces output one byte at a time.
Now, all reasonable implementations of fputs, fprintf, etc. pass the output to the lower-level layers via a reasonably small number or fwrite calls. Only the mingw *printf functions don't do this. For instance, the __mingw_vfprintf function invokes __mingw_pformat, and this functions calls fputc once for each byte. Aside from being inefficient (of course; why does fwrite exist?!), it triggers the aforementioned bug in msvcrt. And this hasn't changed between mingw 5.0 (released in 2016) and mingw 13.0 (released in 2025).
The mingw *printf functions become active by defining __USE_MINGW_ANSI_STDIO to 1.
How do I arrive at this explanation?
Recent MSYS2 comes with a fully working gdb, that can display stack traces, and where step and stepi are working.
In such an MSYS2 environment, I built gettext-0.26/gettext-runtime with --enable-relocatable (so that the .mo files get found without filename hassles) and --disable-shared (to eliminate DLL hassles). I did so in three configuration:
A. mingw 13 / msvcrt
B. mingw 13 with __USE_MINGW_ANSI_STDIO=0 / msvcrt
C. mingw 13 / ucrt
The bug is visible in configuration A, and things are OK in configurations B and C.
In configuration A, I saw a call stack
main
usage
rpl_printf
rpl_vfprintf
__mingw_vfprintf
__mingw_pformat
__pformat_putc
fputc
putc
and, from there, the following functions get invoked:
msvcrt!_flsbuf
msvcrt!_isatty
msvcrt!_write
msvcrt!_setmode
KERNELBASE!GetConsoleMode
KERNELBASE!GetConsoleCP
KERNELBASE!GetConsoleScreenBufferInfoEx
WriteConsoleW
msvcrt!isleadbyte
msvcrt!_errno
msvcrt!.doserrno
strerror_s
In configuration B, I saw a call stack
main
usage
rpl_printf
rpl_vfprintf
vfprintf
and, from there, the following functions get invoked:
ungetwc
msvcrt!_isatty
msvcrt!_isleadbyte_l
strerror_s
msvcrt!_flsbuf
msvcrt!_write
msvcrt!_setmode
KERNELBASE!GetConsoleMode
KERNELBASE!GetConsoleCP
KERNELBASE!GetConsoleScreenBufferInfoEx
WriteConsoleW
msvcrt!mbtowc
msvcrt!_mbtowc_l
MultiByteToWideChar
KERNELBASE!GetCPHashNode
WideCharToMultiByte
WriteFile
In configuration C, I saw a call stack
main
usage
rpl_printf
rpl_vfprintf
__mingw_vfprintf
__mingw_pformat
__pformat_putc
fputc
ucrtbase!fputc
and, from there, the following functions get invoked:
ucrtbase!_get_wpgmptr
ucrtbase!_fputc_nolock
ucrtbase!_write
ucrtbase!_wfsopen
ucrtbase!_isatty
ucrtbase!___lc_locale_name_func
KERNELBASE!GetConsoleMode
KERNELBASE!GetConsoleOutputCP
It's the WriteConsoleW function, when invoked on individual bytes, that produces the effect of JISX0201 characters (configuration A). In configuration B, WriteConsoleW is used as well, but on strings composed of entire characters.
Wow, what an in-depth analysis, @bhaible!
PS: I tried to build gettext with -D__USE_MINGW_ANSI_STDIO=0, but tests are failing, exactly like 1 year ago.
Wow, what an in-depth analysis
Well, before doing this change in Gnulib, where it affects all programs built with Gnulib, I figured I better be really sure of what I'm saying. And it was confusing to see that the same __mingw_vfprintf function works perfectly fine with ucrt, but not with msvcrt.
I've added two commits to Gnulib: https://lists.gnu.org/archive/html/bug-gnulib/2025-09/msg00213.html https://lists.gnu.org/archive/html/bug-gnulib/2025-09/msg00214.html
and verified that with them, .\gettext --help in a Japanese Windows environment, in a cmd.exe with code page 932, and with set LANG=Japanese_Japan.932 shows correctly.
This is a smaller-impact fix.
Setting __USE_MINGW_ANSI_STDIO to 0 is also a fix, but it has a larger impact, as it affects also functions like sprintf and thus needs more workarounds in Gnulib's vasnprintf module.
Great!
I've added two commits to Gnulib: https://lists.gnu.org/archive/html/bug-gnulib/2025-09/msg00213.html https://lists.gnu.org/archive/html/bug-gnulib/2025-09/msg00214.html
Is the second link right? Or shoud it be https://lists.gnu.org/archive/html/bug-gnulib/2025-09/msg00217.html instead?
Is the second link right? Or should it be https://lists.gnu.org/archive/html/bug-gnulib/2025-09/msg00217.html instead?
I guessed the URL before the message went into the archive. Apparently I had no luck :)