hakyll icon indicating copy to clipboard operation
hakyll copied to clipboard

Web.Pandoc: refactor reader selection

Open mmirate opened this issue 7 years ago • 3 comments

With this change, input file formats are no longer restricted to the variants of a union on Hakyll's side (Web.Pandoc.FileType) which must be updated whenever Pandoc adds a new input format (and requires tightening the lower-bound on the dependency version). Instead, the getReader function from Pandoc is now used, in conjunction with a file-extension-synonym mapping similar to the one used by Pandoc's command-line application.

(Of course, someone should probably raise an issue on Pandoc to have them factor-out their filename-to-reader/writer translation into a public API that Hakyll could use; that will make things even simpler on Hakyll's side.) ~~Update: Pandoc has indeed made just such a change, after this PR was first written~~ https://github.com/jgm/pandoc/blob/2.7.3/src/Text/Pandoc/App/FormatHeuristics.hs is close, but not publicly accessible.

With the recent addition of readers of non-textual formats (namely epub, docx and odt), there was a split among Pandoc's readers+writers, between String and Lazy ByteString input; pandocCompiler handles this under-the-hood, and for other usecases there is a new function readPandocLBSWith, whose input must be from Compiler.getResourceLBS instead of Compiler.getResourceBody.

Since the only lossless way to read binary data is to read it immediately to a ByteString, it would have been ideal for readPandoc and readPandocWith to have accepted Item ByteString from the start; changing this now would be even more breaking than removing FileType.

Yes, this change breaks anyone who depended on FileType.

mmirate avatar Jul 10 '17 23:07 mmirate

Is there any intention on this being merged? I can contribute if these changes are getting stale.

theNerd247 avatar Aug 19 '19 16:08 theNerd247

@theNerd247 Yes, I'm still interested in getting this merged. I'd like to keep Hakyll's FileType module there, even though it's no longer being used, with a deprecation warning on the module that it will be removed in a future release. I'm fine with killing the tests for it; but I would like to have at least one test for the ByteString-based pandoc reading.

jaspervdj avatar Aug 20 '19 09:08 jaspervdj

I'd like to keep Hakyll's FileType module there, even though it's no longer being used, with a deprecation warning on the module that it will be removed in a future release. I'm fine with killing the tests for it; but I would like to have at least one test for the ByteString-based pandoc reading.

This is big news to me. Thank you, regardless; better late than never.

Given that feedback, I am still willing to respond to it and to make this PR applicable to current versions of hakyll and pandoc (atm it is highly stale wrt both sides). Though, I won't have cycles available until the weekend.

mmirate avatar Aug 21 '19 13:08 mmirate