Encoding-specific variants?
Using file-embed with non-ASCII files is rather fragile, as it depends on the build system's locale, and it doesn't work at all when building e.g. in a plain docker container (https://github.com/simonmichael/hledger/issues/420#issuecomment-406971517). In hledger, we've solved this by setting the handle encoding to UTF-8 prior to reading from it, which solved our problem.
I'm actually surprised that I haven't any issues here regarding this - maybe building in a pure Nix shell or a plain Docker container is not such a common scenario? Nevertheless, I think that extending the API with variants of the existing functions like embedFile' :: TextEncoding -> FilePath -> Q Exp and others would be a useful change not only for the hledger project.
What do you think?
embedFile is working with ByteStrings, and therefore shouldn't care about the character encoding. Do you have some sample reproducing case them demonstrates the problem? If so, we should just fix embedFile.
The hledger code is using embedStringFile which in turn uses Prelude.readFile (beware!). An alternative would be to try something like:
decodeUtf8 $(makeRelativeToProject "embeddedfiles/hledger.1" >>= embedFile)
To cut down on the duplication, hledger could defined it's own helper in a separate module:
module TH where
...
embedFileText :: FilePath -> Q Exp
embedFileText filePath =
[| decodeUtf8 $(makeRelativeToProject filePath >>= embedFile) |]
so you can use it like:
$(embedFileText "embeddedfiles/hledger.1")
I'd be in favor of deprecating embedStringFile and providing an embedFileUtf8 instead, which is a naming convention I've been moving towards with libraries like rio anyway. But yes, that approach should work.
Thanks for the response. Yes, I didn't think using embedFile would work better (I thought it'd do sth like BC.pack . readFile, don't know why), but it seems that would work just as well. Deprecating embedStringFile sounds sensible, to prevent people from shooting themselves in the foot.
I just ran into this issue. Perhaps it would be sufficient to just add a note to the embedStringFile docs saying it doesn't work with binary data?
It's not entirely obvious why embedStringFile can fail with binary data since, in the end, I can specify e.g. Data.ByteString.Lazy.ByteString as the output type of embedStringFile (which seems like it shouldn't care about character encoding). Except it does, because the IsString interface requires decoding from a String (which isn't obvious at all when glancing at embedStringFile).
Full error:
• Exception when trying to run compile-time code:
/Users/runesvendsen/code/crypto-depth-db/test/data/test.json.zip: hGetContents: invalid argument (invalid byte sequence)
Code: makeRelativeToProject "test/data/test.json.zip"
>>= embedStringFile
• In the untyped splice:
$(makeRelativeToProject "test/data/test.json.zip"
>>= embedStringFile)
|
49 | [ $(makeRelativeToProject "test/data/test.json.zip" >>= embedStringFile)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^