llvm-mingw icon indicating copy to clipboard operation
llvm-mingw copied to clipboard

C++ wide char output doesn't work

Open longnguyen2004 opened this issue 5 years ago • 9 comments

The following code works on MS STL but not libc++:

#include <iostream>
#include <io.h>
#include <fcntl.h>

int wmain(int argc. const wchar_t* argv[])
{
    _setmode(_fileno(stdout), _O_WTEXT);
    std::wcout << L"Thử nghiệm\n";
    return 0;
}

clang: compiled with -std=c++17 -municode MSVC: compiled with /utf-8 /std:c++17 MS STL outputs the string correctly, while libc++ outputs nothing. I've tested the (hopefully) equivalent C code and it works on both compilers:

#include <stdio.h>
#include <io.h>
#include <fcntl.h>

int wmain(int argc, const wchar_t* argv[])
{
    _setmode(_fileno(stdout), _O_WTEXT);
    _putws(L"Thử nghiệm\n");
    return 0;
}

longnguyen2004 avatar Aug 07 '20 02:08 longnguyen2004

I can reproduce this. (Note to other readers and future self: the source files are saved as utf8, and cl.exe is called with the /utf-8 flag.) I'll see if I can find time to dig into the libc++ internals to figure out what's going on later...

mstorsjo avatar Aug 07 '20 10:08 mstorsjo

FWIW, I did dig deeper into this, but it looks like this is fairly nontrivial to fix. The streams std::cout and std::wcout in libc++ don't seem to react to and reconfigure as they should when the output buffer is updated by _setmode(_fileno(stdout), _O_WTEXT);.

The behaviour can be noticed in more detail with this testcase:

#include <iostream>
#include <io.h>
#include <fcntl.h>

int main(int argc, const char* argv[]) {
    std::cout << "cout, before\n";
    std::wcout << L"wcout, before\n";
    fflush(stdout);
    _setmode(_fileno(stdout), _O_WTEXT);
    std::cout << "cout, after\n";
    std::wcout << L"wcout, after\n";
    return 0;
}

When built with MSVC, this outputs the following:

cout, before
wcout, before
cout, after
wcout, after

With libc++, it only prints the two former lines.

I'd suggest filing this as a bug to upstream libc++, as I don't believe I'll have time to dig into it further at the moment.

EDIT: No it doesn't seem to output that; built with MSVC, the line cout, after doesn't get printed but ends up garbled. The output looks like this:

cout, before
wcout, before
潣瑵‬晡整ੲwcout, after

mstorsjo avatar Aug 20 '20 11:08 mstorsjo

Thanks for your help, I'll report this upstream then

longnguyen2004 avatar Aug 20 '20 11:08 longnguyen2004

To be fair, Windows 10 is getting better UTF-8 support now, and even the old ANSI functions now support UTF-8, if 65001 is set as the code page, which would unbreak fstream and others. I'm expecting the use of wide char functions to further decrease over time. But I'd still report this bug, since UTF-8 codepage isn't available on Windows 7 for example, and it's still not the default codepage on Windows 10.

longnguyen2004 avatar Aug 25 '20 04:08 longnguyen2004

@longnguyen2004 You can make UTF-8 the default codepage in Windows 10. Although you probably won't want to do that anyway. Since it'll break all non-Unicode apps as you expect. image

driver1998 avatar Aug 28 '20 10:08 driver1998

I do know about that option, but before Windows 10, there's no UTF-8 codepage. Also argv is still not UTF-8, unless Microsoft decides to set UTF-8 as the default codepage.

longnguyen2004 avatar Aug 28 '20 11:08 longnguyen2004

I've posted an initial patch that tries to fix this issue at https://reviews.llvm.org/D146398.

mstorsjo avatar Mar 19 '23 22:03 mstorsjo

The fix for this landed in https://github.com/llvm/llvm-project/commit/fcbbd9649ac165aaf7fc7d60b8fef3b23755179a, and the latest nightly build at https://github.com/mstorsjo/llvm-mingw/releases/tag/nightly contains a version with this fix.

mstorsjo avatar Jun 04 '23 20:06 mstorsjo

There's now a prerelease with LLVM 17.0.0 RC1, at https://github.com/mstorsjo/llvm-mingw/releases/tag/20230730, with this fix too.

mstorsjo avatar Jul 30 '23 20:07 mstorsjo