legacy icon indicating copy to clipboard operation
legacy copied to clipboard

Windows installation does not handle Unicode filenames

Open Skyb0rg007 opened this issue 11 months ago • 3 comments

Version

110.99.4 (Latest)

Operating System

  • [ ] Any
  • [ ] Linux
  • [ ] macOS
  • [X] Windows
  • [ ] Other Unix

OS Version

Windows 11 Pro

Processor

  • [X] Any
  • [ ] Arm (using Rosetta)
  • [ ] PowerPC
  • [ ] Sparc
  • [ ] x86 (32-bit)
  • [ ] x86-64 (64-bit)
  • [ ] Other

System Component

Core system

Severity

Minor

Description

On Windows, SML is unable to open files with Unicode filenames:

# PowerShell
> "hello" | Out-File -LiteralPath "foo`u{d83d}`u{de4f}.txt";
> "hello" | Out-File -LiteralPath "bar`u{03BB}.txt";

(* SML *)
- val dir = OS.FileSys.openDir ".";
- OS.FileSys.readDir dir;
val it = SOME "foo??.txt" : string option
- OS.FileSys.readDir dir;
val it = SOME "bar?.txt" : string option
- TextIO.openIn ("foo" ^ UTF8.encode 0wx1F64F ^ ".txt");
uncaught exception Io [Io: openIn failed on "foo🙏.txt", Win32TextPrimIO.openRd: failed]
raised at: Basis/Implementation/IO/text-io-fn.sml:792.25-792.71
- TextIO.openIn ("foo" ^ UTF8.encode 0wxD83D ^ UTF8.encode 0wxDE4F ^ ".txt");
uncaught exception Io [Io: openIn failed on "foo🙏.txt", Win32TextPrimIO.openRd: failed]
raised at: Basis/Implementation/IO/text-io-fn.sml:792.25-792.71
- TextIO.openIn ("bar" ^ UTF8.encode 0wx03BB ^ ".txt");
uncaught exception Io [Io: openIn failed on "bar╬╗.txt", Win32TextPrimIO.openRd: failed]
raised at: Basis/Implementation/IO/text-io-fn.sml:792.25-792.71

Transcript

See above

Expected Behavior

OS.FileSys.readDir should not return paths to files that don't exist, and instead return the path of a file that exists. TextIO.openIn should be able to open every file that exists on a system.

Steps to Reproduce

  1. Install on Windows using the .msi
  2. Run the PowerShell line from above, or use your favorite programming language or copy-paste to create a file with a Unicode filename.
  3. Use any of the system APIs and try to access that file.

Additional Information

I believe the issue is that the Win32 APIs are not compiled with the UNICODE macro defined, as OS.FileSys.readDir is implemented using FindFirstFile which is macro-expanded to different versions based on the presence of this macro.

The minwinbase.h header defines WIN32_FIND_DATA as an alias which automatically selects the ANSI or Unicode version of this function based on the definition of the UNICODE preprocessor constant. Mixing usage of the encoding-neutral alias with code that not encoding-neutral can lead to mismatches that result in compilation or runtime errors. For more information, see Conventions for Function Prototypes.

Email address

skyler DOT soss AT gmail.com

Edit: Added example that didn't use UTF16 surrogates, since this is not a UTF16-specific issue.

Skyb0rg007 avatar Mar 14 '24 19:03 Skyb0rg007