GHC-supplied hsc2hs executable fails if non-ISO/IEC 8859-1 (Latin-1) code points are in the path
This issue applies to the executable provided with, at least, GHC 9.4.8, 9.6.6, 9.8.4 and 9.10.1.
On Windows 11 in Windows Terminal, with Hebrew characters (a right-to-left language):
❯ D:\שזדס\sr-test\programs\x86_64-windows\ghc-9.8.4\bin\hsc2hs.exe --verbose --cc=D:\שזדס\sr-test\programs\x86_64-windows\ghc-9.8.4\mingw\bin\clang.exe --ld=D:\שזדס\sr-test\programs\x86_64-windows\ghc-9.8.4\mingw\bin\clang.exe -o Dummy.hs Dummy.hsc
Executing: (@./\hsc7C24.rsp) D:\\שזדס\\sr-test\\programs\\x86_64-windows\\ghc-9.8.4\\mingw\\bin\\clang.exe -c ./Dummy_hsc_make.c -o ./Dummy_hsc_make.o
hsc2hs-ghc-9.8.4.exe: fd:3: hGetContents: invalid argument (cannot decode byte sequence starting from 233)
Dummy_hsc_make.c is created and starts:
#include "D:\����\sr-test\programs\x86_64-windows\ghc-9.8.4\lib\template-hsc.h"
With Cyrillic characters (a left-to-right language):
❯ D:\Майк\sr-test\programs\x86_64-windows\ghc-9.8.4\bin\hsc2hs.exe --verbose --cc=D:\Майк\sr-test\programs\x86_64-windows\ghc-9.8.4\mingw\bin\clang.exe --ld=D:\Майк\sr-test\programs\x86_64-windows\ghc-9.8.4\mingw\bin\clang.exe -o Dummy.hs Dummy.hsc
Executing: (@./\hsc2E30.rsp) D:\\Майк\\sr-test\\programs\\x86_64-windows\\ghc-9.8.4\\mingw\\bin\\clang.exe -c ./Dummy_hsc_make.c -o ./Dummy_hsc_make.o
compiling ./Dummy_hsc_make.c failed (exit code 1)
rsp file was: "./\\hsc2E30.rsp"
command was: D:\\Майк\\sr-test\\programs\\x86_64-windows\\ghc-9.8.4\\mingw\\bin\\clang.exe -c ./Dummy_hsc_make.c -o ./Dummy_hsc_make.o
error: ./Dummy_hsc_make.c:1:10: fatal error: 'D:\09:\sr-test\programs\x86_64-windows\ghc-9.8.4\lib\template-hsc.h' file not found
#include "D:\<U+001C>09:\sr-test\programs\x86_64-windows\ghc-9.8.4\lib\template-hsc.h"
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Dummy.hsc is simply:
module Dummy where
dummy :: IO ()
dummy = pure ()
The expected behaviour is that hsc2hs handles all valid paths on platforms supported by GHC.
(The context is that a Stack user reported this as an issue:
- https://github.com/commercialhaskell/stack/issues/6670 )
I think the general issue is not Windows-specific. With Ubuntu 24.04.1 LTS (via WSL2) in Windows Terminal:
$ /home/mpilgrem/.stack/שזדס/programs/x86_64-linux/ghc-tinfo6-9.8.4/bin/hsc2hs --verbose --cc=/usr/bin/gcc --ld=/usr/bin/gcc -o Dummy.hs Dummy.hsc
Executing: (@./hsc2hscall56604-0.rsp) /usr/bin/gcc -c ./Dummy_hsc_make.c -o ./Dummy_hsc_make.o -I/home/mpilgrem/.stack/שזדס/programs/x86_64-linux/ghc-tinfo6-9.8.4/include/include/
hsc2hs-ghc-9.8.4: fd:6: hGetContents: invalid argument (cannot decode byte sequence starting from 233)
Dummy_hsc_make.c is created and starts:
#include "/home/mpilgrem/.stack/����/programs/x86_64-linux/ghc-tinfo6-9.8.4/lib/ghc-9.8.4/lib/template-hsc.h"
I think part of the problem could be as simple as DirectCodegen.outputDirect uses Common.writeBinaryFile which, in turn, uses System.IO.withBinaryFile and that, effectively, restricts the input String to ASCII and the ISO/IEC 8859-1 (Latin-1) extension.
The original choice of char8 encoding seems to date from this commit:
- https://github.com/haskell/hsc2hs/commit/a0baf89fb765518b9045c9b32f26f86028193879
which seems to cross-reference this GHC issue:
- https://gitlab.haskell.org/ghc/ghc/-/issues/3837