legacy
legacy copied to clipboard
Windows installation does not handle Unicode filenames
Version
110.99.4 (Latest)
Operating System
- [ ] Any
- [ ] Linux
- [ ] macOS
- [X] Windows
- [ ] Other Unix
OS Version
Windows 11 Pro
Processor
- [X] Any
- [ ] Arm (using Rosetta)
- [ ] PowerPC
- [ ] Sparc
- [ ] x86 (32-bit)
- [ ] x86-64 (64-bit)
- [ ] Other
System Component
Core system
Severity
Minor
Description
On Windows, SML is unable to open files with Unicode filenames:
# PowerShell
> "hello" | Out-File -LiteralPath "foo`u{d83d}`u{de4f}.txt";
> "hello" | Out-File -LiteralPath "bar`u{03BB}.txt";
(* SML *)
- val dir = OS.FileSys.openDir ".";
- OS.FileSys.readDir dir;
val it = SOME "foo??.txt" : string option
- OS.FileSys.readDir dir;
val it = SOME "bar?.txt" : string option
- TextIO.openIn ("foo" ^ UTF8.encode 0wx1F64F ^ ".txt");
uncaught exception Io [Io: openIn failed on "foo🙏.txt", Win32TextPrimIO.openRd: failed]
raised at: Basis/Implementation/IO/text-io-fn.sml:792.25-792.71
- TextIO.openIn ("foo" ^ UTF8.encode 0wxD83D ^ UTF8.encode 0wxDE4F ^ ".txt");
uncaught exception Io [Io: openIn failed on "foo🙏.txt", Win32TextPrimIO.openRd: failed]
raised at: Basis/Implementation/IO/text-io-fn.sml:792.25-792.71
- TextIO.openIn ("bar" ^ UTF8.encode 0wx03BB ^ ".txt");
uncaught exception Io [Io: openIn failed on "bar╬╗.txt", Win32TextPrimIO.openRd: failed]
raised at: Basis/Implementation/IO/text-io-fn.sml:792.25-792.71
Transcript
See above
Expected Behavior
OS.FileSys.readDir
should not return paths to files that don't exist, and instead return the path of a file that exists.
TextIO.openIn
should be able to open every file that exists on a system.
Steps to Reproduce
- Install on Windows using the .msi
- Run the PowerShell line from above, or use your favorite programming language or copy-paste to create a file with a Unicode filename.
- Use any of the system APIs and try to access that file.
Additional Information
I believe the issue is that the Win32 APIs are not compiled with the UNICODE
macro defined, as OS.FileSys.readDir
is implemented using FindFirstFile
which is macro-expanded to different versions based on the presence of this macro.
The minwinbase.h header defines WIN32_FIND_DATA as an alias which automatically selects the ANSI or Unicode version of this function based on the definition of the UNICODE preprocessor constant. Mixing usage of the encoding-neutral alias with code that not encoding-neutral can lead to mismatches that result in compilation or runtime errors. For more information, see Conventions for Function Prototypes.
Email address
skyler DOT soss AT gmail.com