subprocess.h icon indicating copy to clipboard operation
subprocess.h copied to clipboard

Unicode support on Windows?

Open windowsair opened this issue 2 years ago • 9 comments

Working with character encoding on Windows is really annoying.

Passing UTF-8 characters on the command line using CreateProcessA seems to be impossible. While local code pages seem to be able to handle special characters such as Chinese and Japanese, they don't seem to be able to do that for emoji.

In my case, the way I use it is to convert your commandLineCombined to UTF16LE characters and then call CreateProcessW. My original input was UTF8 characters, and this modification seems to handle UTF8 characters properly.

What do you think of this? Thanks.

windowsair avatar Sep 22 '22 16:09 windowsair

CreateProcessA seems to internally convert to the OEM code page, which is kind of broken for characters like emoji.

windowsair avatar Sep 22 '22 16:09 windowsair

What version of windows were you running on? The reason I ask is that I thought that CreateProcessA supported UTF-8 on later Windows versions!

sheredom avatar Oct 14 '22 15:10 sheredom

Oh, and I don't seem to be receiving messages from github.

Windows 10 19044, I think that's new enough.

windowsair avatar Oct 24 '22 14:10 windowsair

There's a manifest thing to do to enable that apparently: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

Did you happen to try it? I'd be curious to know if it works

jlaumon avatar Jan 03 '24 17:01 jlaumon

~~Ok, reading that page again I'm not really sure what that manifest is for. It seems to say the default code page is UTF8 without manifest for recent enough Windows (and that's the case for me, but of course I first tested with OutputDebugStringA which doesn't support UTF8).~~

The manifest is indeed needed, not sure how I managed to fumble my previous test. GetACP() returns CP_UTF8 when it works.

I'll give subprocess with UTF8 command lines a try in the following days and let you know how it went.

jlaumon avatar Jan 03 '24 18:01 jlaumon

Hi, @jlaumon

Thank you for following this. This issue has been raised for some time now. I have now replaced it with CreateProcessW and it works fine. I always use CP_UTF8 for the parent process, but I'm not sure if the child process inherits this.

windowsair avatar Jan 04 '24 01:01 windowsair

If someone wants to put together a PR that passes CI I'd happily accept it.

sheredom avatar Jan 05 '24 14:01 sheredom

I just tried subprocess on xcopy.exe to copy file/directory names with non-ascii characters in UTF-8 and, as long as that magic manifest is there, it just works!

I am now the proud owner of 🍌.txt and 🍌_copy.txt.

jlaumon avatar Jan 05 '24 21:01 jlaumon

UTF-8 support on Windows is still beta and does not work by default (at least with localized envrionments for some Asian languages.) You need a manifest file to use CreateProcessA with utf-8 as jlaumon said.

But for a cross-platform single-header library, it's not desirable that behavior changes depending on external factors (including manifest files,) I think.

As far as I know, common cross-platform libraries use the wchar version of APIs with MultiByteToWideChar like windowsair did. Or support UTF16 strings.

matyalatte avatar Jun 22 '24 04:06 matyalatte