hakyll
hakyll copied to clipboard
Web.Pandoc: refactor reader selection
With this change, input file formats are no longer restricted to the variants of a union on Hakyll's side (Web.Pandoc.FileType) which must be updated whenever Pandoc adds a new input format (and requires tightening the lower-bound on the dependency version). Instead, the getReader
function from Pandoc is now used, in conjunction with a file-extension-synonym mapping similar to the one used by Pandoc's command-line application.
(Of course, someone should probably raise an issue on Pandoc to have them factor-out their filename-to-reader/writer translation into a public API that Hakyll could use; that will make things even simpler on Hakyll's side.) ~~Update: Pandoc has indeed made just such a change, after this PR was first written~~ https://github.com/jgm/pandoc/blob/2.7.3/src/Text/Pandoc/App/FormatHeuristics.hs is close, but not publicly accessible.
With the recent addition of readers of non-textual formats (namely epub, docx and odt), there was a split among Pandoc's readers+writers, between String and Lazy ByteString input; pandocCompiler
handles this under-the-hood, and for other usecases there is a new function readPandocLBSWith
, whose input must be from Compiler.getResourceLBS
instead of Compiler.getResourceBody
.
Since the only lossless way to read binary data is to read it immediately to a ByteString, it would have been ideal for readPandoc
and readPandocWith
to have accepted Item ByteString
from the start; changing this now would be even more breaking than removing FileType.
Yes, this change breaks anyone who depended on FileType.
Is there any intention on this being merged? I can contribute if these changes are getting stale.
@theNerd247 Yes, I'm still interested in getting this merged. I'd like to keep Hakyll's FileType
module there, even though it's no longer being used, with a deprecation warning on the module that it will be removed in a future release. I'm fine with killing the tests for it; but I would like to have at least one test for the ByteString
-based pandoc reading.
I'd like to keep Hakyll's
FileType
module there, even though it's no longer being used, with a deprecation warning on the module that it will be removed in a future release. I'm fine with killing the tests for it; but I would like to have at least one test for theByteString
-based pandoc reading.
This is big news to me. Thank you, regardless; better late than never.
Given that feedback, I am still willing to respond to it and to make this PR applicable to current versions of hakyll and pandoc (atm it is highly stale wrt both sides). Though, I won't have cycles available until the weekend.