C#: Always decode `dotnet` output as UTF-8
Even tho this seems like a straight forward fix and testing shows that this appears correct, but console handling on Windows is complicated and as far as I can tell this shouldn't work as is, so I am not sure what I am missing here and if this can cause issues in some cases.
SPOILER: One quick run through the Windows console complications
- A console is supposed to represent a virtual VGA text mode display, most oddities of the system are because of this.
- Every console is associated with 1 or more processes and every process with 0 or 1 console, e.g. multiple process can share a console
- Every console has a input and an output encoding that can only be get and set by processes that use that console
- Sub processes use their parent console (there are ways to change that, but they are not used here)
- If the parent doesn't have a console, and the child uses the console subsystem, a new console is allocated.
- Now for the relevant .NET bits:
- Writing to stdout uses the console output encoding by default.
- Reading from the redirected stdout of a child process uses the console output encoding by default.
- Thus as long as the parent & child process use the same console / didn't change their encoding from the system default everything should be passed correctly
- The failure mode I would expect with this is that the localized string contains characters that cannot be encoded and thus get replaced with the fallback char (generally question marks)
- But in reality it looks like
dotnetalways writes UTF8 output, and thus Godot decodes things incorrectly (unless the system default encoding also is UTF8)
If anyone has an idea what causes the dotnet output to be UTF-8 encoded, that would be great to know, so that we can ensure that this doesn't cause any regressions anywhere.
The upstream change is not relevant for the issues, as it it only include in the .NET 8 previews.
If not, changing the windows default code page to utf-8 also fixes the issues, both without the fix and any potential regressions with this fix. (also it's probably a good things to do regardless)
- Supersedes: https://github.com/godotengine/godot/pull/73861 / https://github.com/godotengine/godot/pull/73865
- Related upstream change: https://github.com/dotnet/sdk/pull/29755
- Fixes #71326
- Workaround: https://superuser.com/a/1435645
Since the issues seem to be Windows only, I made the encoding changes only when running on windows.
I did a bunch more source diving and found that the output of --list-sdks is encoded vfwprintf as multibyte characters of the current locale, which means ???.
For the other commands the output is generated by MSBuild which is a massive system and it is not super easy to trace which parts are relevant, but most likely the output is generated by Console.WriteLine which should use the encoding as elaborated above, but the whole https://github.com/dotnet/dotnet repo also contains a whole bunch of places where the encoding is unconditionally set to UTF8 and it is not exactly clear if / when any of them are relevant.
After having spent way too much time on this and still not understanding the whole system:
- This appears to fix the issue
- Any possible regressions are very limited in scope, and not very important
- I'd say merge this fix
- IF we get any reports of regressions, either recommend to them to configure their windows to use utf8 globally or if there is a significant number of users that experience regressions, revert it and recommend to the users that had issues before to configure their windows to use utf8 globally
- ¯\_(ツ)_/¯
If anyone has an idea what causes the dotnet output to be UTF-8 encoded, that would be great to know.
It looks like you found the biggest source: https://github.com/dotnet/sdk/blob/af67718a9fce6ba01796ed06df44299fe17da67b/src/Cli/Microsoft.DotNet.Cli.Utils/UILanguageOverride.cs#L27 but yes, dotnet new and dotnet watch and others can also do the same and it is a fool's errand to try to tell from our codebases. Besides that, MSBuild will also have a similar change upcoming which makes it use utf8.
--list-sdks output comes from multiple repositories and executables on the system, one being the runtime host repo, the other the .NET SDK, which is why it was different.
Thanks!
Cherry-picked for 4.0.4.