binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Processing modules at paths that contain extended unicode characters on windows fails

Open brson opened this issue 3 years ago • 2 comments

As far as I can determine, binaryen always fails to open files that contain non-ascii (or perhaps non-latin-1 or some other windowsy encoding) in their path on windows. This is probably because there is no explicit "wide char" support in the code, which is how windows represents unicode, as utf-16.

I have tried to process files via unicode paths with a self-built wasm-opt.exe within the developer command prompt, inside cygwin's mintty, and programmatically (from rust bindings):

cygwin:

$ bin/wasm-opt.exe ../..//hello_world❤️.wasm -o ../..//hello_world_opt.wasm
Failed opening '../..//hello_world??.wasm'

command prompt:

c:\cygwin64\home\Brian\binaryen\build>bin\wasm-opt.exe ..\..\hello_world❤️.wasm -o ..\..\hello_world_opt.wasm
Failed opening '..\..\hello_world??.wasm'

Fixing this seems like it would require all paths to be hidden behind some platform-specific typedefs that resolve to WCHAR on windows, but I don't have much experience here.

cc https://github.com/brson/wasm-opt-rs/issues/40

brson avatar Aug 30 '22 21:08 brson

Looks like we could use std::filesystem::path since we're on C++17.

tlively avatar Aug 31 '22 14:08 tlively

filesystem::path looks like the right way to go and should just work with the standard file i/o methods.

From googling I don't see a standardized abstraction for getting wide char arguments from main. Seems like dealing with that and then sorting out which arguments are strings and which are paths could be messy.

brson avatar Sep 01 '22 16:09 brson

Is this something the maintainers would be open to accepting patches for? I could see it being fairly disruptive to the codebase. It might be done in two steps: adjusting the internals to be unicode-correct on windows, then adapting all the main functions.

brson avatar Oct 31 '22 19:10 brson

I think this would be worth accepting patches for. I can imagine the changes would be disruptive in the parts of the code base that deal with file paths, but I think those parts are fairly small overall, so it should be ok.

tlively avatar Oct 31 '22 22:10 tlively

I have made progress on this here: https://github.com/brson/binaryen/tree/unicode

The wasm-opt in that branch can input and output to unicode paths on windows. I still need to convert all the other bins, do some cleanup, write some tests. It'll be a while yet.

The changes necessary are not all that invasive. As structured now the bins have to do some unicode conversions for path types, though it might be possible to hide those conversions inside the command-line Options type.

brson avatar Jan 25 '23 22:01 brson

There are two PRs open related to this issue:

  • https://github.com/WebAssembly/binaryen/pull/5514 - this is my PR. It adopts filesystem::path, and changes the main function to reencode the main arguments on windows
  • https://github.com/WebAssembly/binaryen/pull/5627 - this one just does the incremental step of using filesystem::path in the I/O APIs

brson avatar Apr 06 '23 19:04 brson