Opt-in to native UTF-8 support for OS interaction on Windows
From discussion with @williamjcm and @sthalik on Gitter. To avoid UTF-16 conversion in every Utility::Path API that deals with the filesystem on Windows (and other areas, like environment access), we could pass UTF-8 support directly into the A APIs. It's an opt-in feature, and there's three ways to achieve this:
- Changing the global code-page in Windows settings. Requires user interaction, so not a viable option.
- Linking to UCRT instead of MSVCRT and calling
setlocale(LC_ALL, ".utf-8"). Available since Windows 10 SDK 1803. - (Likely also linking to UCRT) and adding an entry to the app manifest. Requires Windows 10 SDK 1903+.
The second option could be done inside CorradeMain, the third variant documented alongside HiDPI support, for example. Then, all Path APIs would check the prerequisites (Windows version, UCRT vs MSVCRT, and if the codepage is set to UTF-8) and pick a more optimal path in that case.
TODOs left:
-
[ ] Figure out a way how to robustly check that we can use UTF-8. Is UTF-8 codepage presence enough (checked with
setlocale(LC_ALL, nullptr))?. Or do we also need to check for Windows version and/or UCRT presence?- Code snippet to use: https://gist.github.com/sthalik/17613fc4eefcae39d4e871d2f2fb7ecd (a runtime-linker variant is the first revision, if needed)
-
[ ] Figure out a way how to check just once and store it in some global variable instead of doing the check again in every
PathAPI, without running into thread safety, thread-local variables, duplicated globals and other nasty issues in yet another place.- Though some rough 3rd party code could
setlocale()on its own and break it, so there's probably no way around checking every time :/
- Though some rough 3rd party code could
-
[ ] The
*AAPIs still have theMAX_PATHlimitation, and it's apparently impossible to work around that:In the ANSI version of this function, the name is limited to
MAX_PATHcharacters. To extend this limit to 32,767 wide characters, call the Unicode version of the function and prepend\\?\to the path. For more information, see Naming Files, Paths, and Namespaces.Which makes this whole effort rather useless. But maybe there's other ways how to circumvent this?
- Maybe it could fall back to the
*WAPIs if the input UTF-8 path is longer thanMAX_PATH? That could make it work for 90% of use cases, OTOH it means we have to explicitly test each and every Path API to handle this well. Though since we have to have that fallback for when the locale changes again (as noted above) anyway, it shouldn't mean that much extra code.
- Maybe it could fall back to the
-
[ ] Setting the code page to UTF-8 may be considered "not nice" to 3rd party libraries that still rely on
*AAPIs. Consider if a compile-time opt-out for this feature is enough or if it should be opt-in (for example to be enabled by the users if they know it won't break 3rd party stuff).- Or, possibly, don't set anything but use UTF-8 if the codepage is discovered to be UTF-8? Seems like the least intrusive option, but still without falling back to UTF-16 conversion.