msys2-runtime icon indicating copy to clipboard operation
msys2-runtime copied to clipboard

Multi-byte characters not rendered correctly when output from a C or C++ program

Open tfpf opened this issue 7 months ago • 0 comments

If a C or C++ program writes multi-byte characters to the console, they are not rendered correctly. The following shell script demonstrates the same.

#! /usr/bin/env sh

pacman -S --needed mingw-w64-ucrt-x86_64-gcc mingw-w64-ucrt-x86_64-python

printf '#include <stdio.h>\nint main(void) { puts("∈√≈≡⊥"); }\n' >msys2.c
gcc msys2.c
./a
echo $(./a)

printf '#include <cstdio>\nint main(void) { std::puts("∈√≈≡⊥"); }\n' >msys2.cc
g++ msys2.cc
./a
echo $(./a)

printf 'import sys\n\nprint("∈√≈≡⊥")' >msys2.py
python msys2.py

echo "∈√≈≡⊥"

Here's the output.

warning: mingw-w64-ucrt-x86_64-gcc-14.1.0-3 is up to date -- skipping
warning: mingw-w64-ucrt-x86_64-python-3.11.9-1 is up to date -- skipping
 there is nothing to do
∈√≈≡⊥
∈√≈≡⊥
∈√≈≡⊥
∈√≈≡⊥
∈√≈≡⊥
∈√≈≡⊥

Key Observations

  • When multi-byte characters are written by a C or C++ program, the actual characters written don't appear to be related, and can themselves by multi-byte.
  • If the output is saved to a variable and then echoed (see echo $(./a) above), the characters are displayed correctly.
  • Upon writing multi-byte characters from Python or sh, nothing unexpected occurs.

I am using 64-bit MSYS2 20230526. I didn't try this with the latest version because I didn't find any bug reports for this issue even after searching for a while.

tfpf avatar Jul 23 '24 16:07 tfpf