pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Use ADT to represent input formats

Open tarleb opened this issue 5 years ago • 6 comments

A new module Text.Pandoc.Format is added and exposed to library users.

Types supported as input and/or output format can be described via value of the KnownFormat type. The submodule Text.Pandoc.Format.KnownFormat is hidden, but re-exported through its parent module.

Text.Pandoc.Extensions is made a submodule of Text.Pandoc.Format. The Extensions module is now hidden, but re-exported through Text.Pandoc.Format.

tarleb avatar Dec 02 '18 13:12 tarleb

Experimental approach to formalizing formats.

Pros:

  • stricter types and specific errors (e.g., for the -D command line option);
  • implicit knowledge made explicit;
  • formats are enumerable.

Cons:

  • more code and (explicit) complexity;
  • naming conflicts (e.g., constructors of EPUBVersion and HTMLSlidesVariant).

tarleb avatar Dec 02 '18 13:12 tarleb

Wow, this is a lot of code! I'm not sure yet what I think -- I haven't had time to look at it all.

Can you say more about what problem this solves, or what problem motivated it?

One small comment: since markdown_mmd etc. are implemented as sets of extensions on top of Markdown, it makes more sense to have Markdown be the format, and to implement the others as flavors, it seems to me. But maybe there's a reason for doing it this way?

Also: there's an issue somewhere for making Format an enumerated type (in RawBlock and RawInline) (#547). It would be good to think about that, too, in this context. We wouldn't want to end up with two distinct Format enumerations.

jgm avatar Dec 04 '18 17:12 jgm

Sorry for dumping such a large chunk of code. I wanted to do a small edit, but then kept making changes to ensure the approach to be sensible in a bigger context (my first two approaches turned out to be bad).

The main driver for this change is the idea to expose functions like read_file or write_file to Lua users. I'm approaching this by trying to define the respective PandocMonad Haskell functions in a way that would be pleasant and useful for consumers of the Haskell library. We could then define a PandocLua type as an instance of PandocMonad and just use these functions. This is also were the idea for a T.P.IO module came from.

So what I'm looking for is a function writeOutput :: (PandocMonad m, MonadIO m) => OutputOptions -> FilePath -> m (), where OutputOptions contains the target format, writerOptions, etc. The problem I'm facing is that I'd have to pass formats and extensions as a string so it can be passed to getWriter. But that also means that either the extensions encoded in the string, or those stored in WriterOptions, would be ignored. That seemed non-optimal.

So this PR started as an edit to getWriter and getReader, but then got out of hand when I tried to get it "right".


I added separate constructors for all markdown types because the format is passed to filters; filters would no longer have the ability to distinguish between markdown and markdown_mmd output. I feel that a single Markdown type would be the right thing to do, but still didn't want to break backwards compatibility.

tarleb avatar Dec 04 '18 21:12 tarleb

I forgot an important piece: I'd like OutputOptions to be easy to create as a Lua value, so it should not contain types like Writer m.

tarleb avatar Dec 04 '18 21:12 tarleb

I've rebased the PR, downsized its scope, and made it backwards compatible. It now just shows the general idea, but applies it only in selected parts of the code base.

My plan would be to iterate on this after (if) this PR gets merged:

  1. implement the same for output formats;
  2. use these formats where possible, including template selection (#8137);
  3. unify input/output formats -- possibly use a format algebra like that in jgm/pandoc-types#78.

tarleb avatar Jun 25 '22 07:06 tarleb

The downside of this approach is that steps 1 and 2 would lead to moderate code duplication. But that would then disappear again with step 3.

tarleb avatar Jun 25 '22 09:06 tarleb

This is outdated, and some useful parts of this approach have already made it into the code. Closing.

tarleb avatar Dec 16 '22 08:12 tarleb