`<print>`: `std::print` may incorrectly display multibyte characters
Describe the bug
std::print incorrectly displays characters that take two or more bytes, when they appear at the boundary of 256 bytes long chunk. This happens only when stdout is a console and nonlocking formatting is used.
Command-line test case
#include <print>
int main() {
std::string str(255, 'a');
str.append("ą"); // U+0105
std::println("{}", str);
}
PS E:\stl> .\out\x64\set_environment.ps1
PS E:\stl> cl /std:c++latest /utf-8 .\print-bug.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.43.34618 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
/std:c++latest is provided as a preview of language features from the latest C++
working draft, and we're eager to hear about bugs and suggestions for improvements.
However, note that these features are provided as-is without support, and subject
to changes or removal as the working draft evolves. See
https://go.microsoft.com/fwlink/?linkid=2045807 for details.
print-bug.cpp
Microsoft (R) Incremental Linker Version 14.43.34618.0
Copyright (C) Microsoft Corporation. All rights reserved.
/out:print-bug.exe
print-bug.obj
PS E:\stl> .\print-bug.exe
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa��
Expected output
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaą
STL version
This bug was introduced in #4821.
Been a while since I read the spec for this in detail for implementation, but at a glance is this not the expected outcome?
Whether print uses vprint_unicode or vprint_nonunicode is decided by if the ordinary literal encoding is unicode. For MSVC that is decided by _MSVC_EXECUTION_CHARACTER_SET being set to unicode (65001), which is controlled by the presence of compiling with /utf-8 or not
So if you're not compiling with unicode character set we call vprint_nonunicode which does not handle printing a unicode string at all as it just does fwrite effectively.
When I try this locally with /utf-8 I get the expected outcome of it printing the ą, and without /utf-8 it prints the replacement characters.
#5894 gives a better explanation.