hsc2hs icon indicating copy to clipboard operation
hsc2hs copied to clipboard

GHC-supplied hsc2hs executable fails if non-ISO/IEC 8859-1 (Latin-1) code points are in the path

Open mpilgrem opened this issue 1 year ago • 3 comments

This issue applies to the executable provided with, at least, GHC 9.4.8, 9.6.6, 9.8.4 and 9.10.1.

On Windows 11 in Windows Terminal, with Hebrew characters (a right-to-left language):

❯ D:\שזדס\sr-test\programs\x86_64-windows\ghc-9.8.4\bin\hsc2hs.exe --verbose --cc=D:\שזדס\sr-test\programs\x86_64-windows\ghc-9.8.4\mingw\bin\clang.exe --ld=D:\שזדס\sr-test\programs\x86_64-windows\ghc-9.8.4\mingw\bin\clang.exe -o Dummy.hs Dummy.hsc
Executing: (@./\hsc7C24.rsp) D:\\שזדס\\sr-test\\programs\\x86_64-windows\\ghc-9.8.4\\mingw\\bin\\clang.exe -c ./Dummy_hsc_make.c -o ./Dummy_hsc_make.o
hsc2hs-ghc-9.8.4.exe: fd:3: hGetContents: invalid argument (cannot decode byte sequence starting from 233)

Dummy_hsc_make.c is created and starts:

#include "D:\����\sr-test\programs\x86_64-windows\ghc-9.8.4\lib\template-hsc.h"

With Cyrillic characters (a left-to-right language):

❯ D:\Майк\sr-test\programs\x86_64-windows\ghc-9.8.4\bin\hsc2hs.exe --verbose --cc=D:\Майк\sr-test\programs\x86_64-windows\ghc-9.8.4\mingw\bin\clang.exe --ld=D:\Майк\sr-test\programs\x86_64-windows\ghc-9.8.4\mingw\bin\clang.exe -o Dummy.hs Dummy.hsc
Executing: (@./\hsc2E30.rsp) D:\\Майк\\sr-test\\programs\\x86_64-windows\\ghc-9.8.4\\mingw\\bin\\clang.exe -c ./Dummy_hsc_make.c -o ./Dummy_hsc_make.o
compiling ./Dummy_hsc_make.c failed (exit code 1)
rsp file was: "./\\hsc2E30.rsp"
command was: D:\\Майк\\sr-test\\programs\\x86_64-windows\\ghc-9.8.4\\mingw\\bin\\clang.exe -c ./Dummy_hsc_make.c -o ./Dummy_hsc_make.o
error: ./Dummy_hsc_make.c:1:10: fatal error: 'D:\09:\sr-test\programs\x86_64-windows\ghc-9.8.4\lib\template-hsc.h' file not found
#include "D:\<U+001C>09:\sr-test\programs\x86_64-windows\ghc-9.8.4\lib\template-hsc.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Dummy.hsc is simply:

module Dummy where

dummy :: IO ()
dummy = pure ()

The expected behaviour is that hsc2hs handles all valid paths on platforms supported by GHC.

(The context is that a Stack user reported this as an issue:

  • https://github.com/commercialhaskell/stack/issues/6670 )

mpilgrem avatar Dec 13 '24 22:12 mpilgrem

I think the general issue is not Windows-specific. With Ubuntu 24.04.1 LTS (via WSL2) in Windows Terminal:

$ /home/mpilgrem/.stack/שזדס/programs/x86_64-linux/ghc-tinfo6-9.8.4/bin/hsc2hs --verbose --cc=/usr/bin/gcc --ld=/usr/bin/gcc -o Dummy.hs Dummy.hsc
Executing: (@./hsc2hscall56604-0.rsp) /usr/bin/gcc -c ./Dummy_hsc_make.c -o ./Dummy_hsc_make.o -I/home/mpilgrem/.stack/שזדס/programs/x86_64-linux/ghc-tinfo6-9.8.4/include/include/
hsc2hs-ghc-9.8.4: fd:6: hGetContents: invalid argument (cannot decode byte sequence starting from 233)

Dummy_hsc_make.c is created and starts:

#include "/home/mpilgrem/.stack/����/programs/x86_64-linux/ghc-tinfo6-9.8.4/lib/ghc-9.8.4/lib/template-hsc.h"

mpilgrem avatar Dec 14 '24 15:12 mpilgrem

I think part of the problem could be as simple as DirectCodegen.outputDirect uses Common.writeBinaryFile which, in turn, uses System.IO.withBinaryFile and that, effectively, restricts the input String to ASCII and the ISO/IEC 8859-1 (Latin-1) extension.

mpilgrem avatar Dec 14 '24 17:12 mpilgrem

The original choice of char8 encoding seems to date from this commit:

  • https://github.com/haskell/hsc2hs/commit/a0baf89fb765518b9045c9b32f26f86028193879

which seems to cross-reference this GHC issue:

  • https://gitlab.haskell.org/ghc/ghc/-/issues/3837

mpilgrem avatar Dec 15 '24 15:12 mpilgrem