pandoc
pandoc copied to clipboard
Use ADT to represent input formats
A new module Text.Pandoc.Format is added and exposed to library users.
Types supported as input and/or output format can be described via value
of the KnownFormat type. The submodule
Text.Pandoc.Format.KnownFormat
is hidden, but re-exported through its
parent module.
Text.Pandoc.Extensions is made a submodule of Text.Pandoc.Format. The Extensions module is now hidden, but re-exported through Text.Pandoc.Format.
Experimental approach to formalizing formats.
Pros:
- stricter types and specific errors (e.g., for the
-D
command line option); - implicit knowledge made explicit;
- formats are enumerable.
Cons:
- more code and (explicit) complexity;
- naming conflicts (e.g., constructors of EPUBVersion and HTMLSlidesVariant).
Wow, this is a lot of code! I'm not sure yet what I think -- I haven't had time to look at it all.
Can you say more about what problem this solves, or what problem motivated it?
One small comment: since markdown_mmd
etc. are implemented as sets of extensions on top of Markdown, it makes more sense to have Markdown
be the format, and to implement the others as flavors, it seems to me. But maybe there's a reason for doing it this way?
Also: there's an issue somewhere for making Format
an enumerated type (in RawBlock and RawInline) (#547). It would be good to think about that, too, in this context. We wouldn't want to end up with two distinct Format enumerations.
Sorry for dumping such a large chunk of code. I wanted to do a small edit, but then kept making changes to ensure the approach to be sensible in a bigger context (my first two approaches turned out to be bad).
The main driver for this change is the idea to expose functions like read_file
or write_file
to Lua users. I'm approaching this by trying to define the respective PandocMonad
Haskell functions in a way that would be pleasant and useful for consumers of the Haskell library. We could then define a PandocLua
type as an instance of PandocMonad
and just use these functions. This is also were the idea for a T.P.IO module came from.
So what I'm looking for is a function writeOutput :: (PandocMonad m, MonadIO m) => OutputOptions -> FilePath -> m ()
, where OutputOptions contains the target format, writerOptions, etc. The problem I'm facing is that I'd have to pass formats and extensions as a string so it can be passed to getWriter
. But that also means that either the extensions encoded in the string, or those stored in WriterOptions, would be ignored. That seemed non-optimal.
So this PR started as an edit to getWriter
and getReader
, but then got out of hand when I tried to get it "right".
I added separate constructors for all markdown types because the format is passed to filters; filters would no longer have the ability to distinguish between markdown
and markdown_mmd
output. I feel that a single Markdown type would be the right thing to do, but still didn't want to break backwards compatibility.
I forgot an important piece: I'd like OutputOptions
to be easy to create as a Lua value, so it should not contain types like Writer m
.
I've rebased the PR, downsized its scope, and made it backwards compatible. It now just shows the general idea, but applies it only in selected parts of the code base.
My plan would be to iterate on this after (if) this PR gets merged:
- implement the same for output formats;
- use these formats where possible, including template selection (#8137);
- unify input/output formats -- possibly use a format algebra like that in jgm/pandoc-types#78.
The downside of this approach is that steps 1 and 2 would lead to moderate code duplication. But that would then disappear again with step 3.
This is outdated, and some useful parts of this approach have already made it into the code. Closing.