STL icon indicating copy to clipboard operation
STL copied to clipboard

`<print>`: `std::print` may incorrectly display multibyte characters

Open JMazurkiewicz opened this issue 11 months ago • 2 comments

Describe the bug

std::print incorrectly displays characters that take two or more bytes, when they appear at the boundary of 256 bytes long chunk. This happens only when stdout is a console and nonlocking formatting is used.

Command-line test case

#include <print>

int main() {
  std::string str(255, 'a');
  str.append("ą"); // U+0105
  std::println("{}", str);
}
PS E:\stl> .\out\x64\set_environment.ps1
PS E:\stl> cl /std:c++latest /utf-8 .\print-bug.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.43.34618 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

/std:c++latest is provided as a preview of language features from the latest C++
working draft, and we're eager to hear about bugs and suggestions for improvements.
However, note that these features are provided as-is without support, and subject
to changes or removal as the working draft evolves. See
https://go.microsoft.com/fwlink/?linkid=2045807 for details.

print-bug.cpp
Microsoft (R) Incremental Linker Version 14.43.34618.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:print-bug.exe
print-bug.obj
PS E:\stl> .\print-bug.exe
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa��

Expected output

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaą

STL version

This bug was introduced in #4821.

JMazurkiewicz avatar Jan 15 '25 22:01 JMazurkiewicz

Been a while since I read the spec for this in detail for implementation, but at a glance is this not the expected outcome?

Whether print uses vprint_unicode or vprint_nonunicode is decided by if the ordinary literal encoding is unicode. For MSVC that is decided by _MSVC_EXECUTION_CHARACTER_SET being set to unicode (65001), which is controlled by the presence of compiling with /utf-8 or not

So if you're not compiling with unicode character set we call vprint_nonunicode which does not handle printing a unicode string at all as it just does fwrite effectively.

When I try this locally with /utf-8 I get the expected outcome of it printing the ą, and without /utf-8 it prints the replacement characters.

blackninja9939 avatar May 11 '25 00:05 blackninja9939

#5894 gives a better explanation.

JMazurkiewicz avatar Nov 21 '25 20:11 JMazurkiewicz