haddock icon indicating copy to clipboard operation
haddock copied to clipboard

Support Markdown syntax via `commonmark-hs`

Open harpocrates opened this issue 7 years ago • 26 comments

@jgm @alexbiehl @gbaz This is following up the discussion in https://github.com/haskell/haddock/pull/729.

I'm opening a new issue despite https://github.com/haskell/haddock/issues/244 and https://github.com/haskell/haddock/issues/570 because I'd like to focus on the particular issues around supporting Markdown syntax via commonmark-hs.

As per https://github.com/haskell/haddock/pull/729#issuecomment-377353414, I've gone ahead and

  • implemented instances of IsBlock and IsInline for Haddock's Doc (see https://github.com/harpocrates/haddock/commit/b12bef2016fb8ef54051c7bc473867db3b64ebb8)
  • implemented a custom inline parser for identifier linking (see https://github.com/harpocrates/haddock/commit/4d2566c3d1cca37a248e4b761529c5190a9bdf48).

All of this is very quick and dirty (and in a separate branch) - I'm just trying to get a feel for the library. Still it is nice to see end to end results:

module Mark where

-- | An emphasized _foo_ and a linked 'foo'.
-- Some math I like: $\int_{\partial \Omega} \omega = \int_\Omega \partial\omega$
--
-- A list of things:
--
--   1. First item
--
--      ```
--      some code
--        block
--      ```
--
--   2. Some sub-items
--
--       * sub-item1
--       * __sub-item2__
--
foo :: Int -> Int
foo = undefined

Output:

screen shot 2018-04-04 at 2 49 33 pm

@jgm My first impression of the API is very positive: I very much like the idea of slicing the problem along the dimensions of SyntaxSpec. I definitely think this is going to be extensible enough for Haddock. Some more basic questions:

  1. What is Rangeable for? Is this something Haddock might want (or is range _ x = x enough)
  2. What is the motivation for distinguishing lineBreak and paragraph? When are both used?
  3. What is referenceLinkDefinition?

@gbaz @alexbiehl How can Haddock start using commonmark-hs? I am completely unfamiliar with what needs to happen from a vendoring/dependency perspective. More generally, can I start making some PR's to prepare for eventual integration (i.e. small changes to Documentation.Haddock.Types)?

harpocrates avatar Apr 04 '18 22:04 harpocrates

How can Haddock start using commonmark-hs?

Given commonmark-hs's dependency footprint (which unfortunately includes some packages which wouldn't meet the expected quality bar) I'm rather sceptical (it'd be too much to vendor; and we also need to take into account GHC's build-system and the associated maintenance cost there). We could provide this as an experimental feature enabled by a manual cabal flag. This would also help alleviate the problem of committing to what's essentially an extension of Haddock's specification (which we'd have to support "for eternity" on Hackage, which in future will require the ability to rebuild any docs), as we'd declare this an experimental extension.

hvr avatar Apr 05 '18 06:04 hvr

what's the planned syntax for the module level flag btw?

As for deps, I'm looking now and I see the non-already-ghc-bootlibs as: "case-insensitive, lucid, tagsoup, network-uri". I haven't followed the transitive chain. I wonder how many we could strip-out/vendor if we decide to just vendor the whole parser.

gbaz avatar Apr 05 '18 06:04 gbaz

lucid in particular brings in a bunch of other deps, so seems to be the sticky wicket. i imagine it could be factored out once we past prototype and replaced with something less elegant but still sufficient that's in the bootlibs, like xhtml.

gbaz avatar Apr 05 '18 06:04 gbaz

I went through the dependencies and I think this is workable if the commonmark package could be split into commonmark-core and commonmark (with the former supplying only the bare bones and having minimal dependencies and the latter providing all of the Html and RangedHtml instances). Then, we could depend only on commonmark-core

  • network-uri is needed only for the RangedHtml/Html instances (so would not be in commonmark-core)
  • tagsoup is needed for entity parsing - which I think we don't want (could this be a typeclass in commonmark-core with only the implementation in commonmark relying on `tagsoup)
  • lucid is needed for producing HTML (so would not be in commonmark-core)
  • case-insensitive can have functionality inlined: it just converts to lowercase.
  • uses of mtl can trivially be absorbed into transformers (literally by changing imports)

If we apply these suggestions, would it then be ok to vendor/depend on commonmark-core (which would be left depending only on things already in GHC's dependencies)?

harpocrates avatar Apr 05 '18 14:04 harpocrates

In such a case it seems to me that vendoring would certainly be ok. I'd want to check with the Powers That Be on adding it to ghc's bootlibs, but I lean towards there being a case for it (and it doesn't seem to me a big deal either way in terms of vendor vs. bootlib, tbh).

gbaz avatar Apr 05 '18 15:04 gbaz

In such a case it seems to me that vendoring would certainly be ok.

Maybe so; but my request to keep this an experimental opt-in feature (gated via a cabal flag) until we're ready to commit to the new grammar (and have also had some time to design a haddock feature version gating/advertising protocol) still stands. So IMO it's too early to have the vendoring discussion just yet (and there's upcoming structural changes due to which I'd really like to avoid having to complicate by having to deal w/ an additional lib-vendoring change on top).

hvr avatar Apr 05 '18 16:04 hvr

what's the planned syntax for the module level flag btw?

I can add a module-level flag similar to the module-level hide option. As a first step, you could have {-# OPTIONS_HADDOCK markdown #-} at the top of your file.

Maybe so; but my request to keep this an experimental opt-in feature (gated via a cabal flag) until we're ready to commit to the new grammar (and have also had some time to design a haddock feature version gating/advertising protocol) still stands.

Completely agree! I expect there will be lots of community discussion around the syntactic extensions we'll have to make to markdown to support all of Haddock's features - I'd prefer to be able to change those up frequently before stabilizing.

So IMO it's too early to have the vendoring discussion just yet (and there's upcoming structural changes due to which I'd really like to avoid having to complicate by having to deal w/ an additional lib-vendoring change on top).

Shouldn't we at least make sure there is a viable path forward here? I'd prefer not to pour effort into getting something that works only to find out that the whole initiative needs to be abandoned...


I'll start laying the groundwork for eventually adding this functionality into Haddock.

harpocrates avatar Apr 05 '18 16:04 harpocrates

I'm sorry, I missed this discussion before. To answer your questions @harpocrates:

  • Rangeable is for keeping track of source position information. You probably don't need it and can use a trivial instance.
  • lineBreak is for a line break within a paragraph (<br> in HTML). It is an inline-level element. Paragraph is a block-level concept (<p>..</p> in HTML).
  • referenceLinkDefinition is for a Markdown reference link definition, e.g.
    [google]:  http://google.com
    
    This can render as an empty string for most purposes. (It's mainly in there for people who want to use the library for syntax highlighting, or for rendering markdown.)

As for the dependency footprint, I'm certainly open to reducing that. It would make sense to split the current commonmark package into commonmark-core and commonmark-lucid. (EDIT: The only hitch is that in order to run the commonmark tests, we need to generate HTML. If we moved the lucid stuff to a separate package, we'd have to duplicate it in commonmark-core for the test suite, or move most of the testing to commonmark-lucid.)

Eliminating the tagsoup dependency is a bit harder, since currently entity resolution is part of the commonmark spec, but that might change. In the mean time, we could just include a module with an entity lookup table, instead of depending on tagsoup.

jgm avatar Apr 24 '18 05:04 jgm

@jgm Thanks, I think everything makes sense now.

As for the dependency footprint, I'm certainly open to reducing that. It would make sense to split the current commonmark package into commonmark-core and commonmark-lucid. (EDIT: The only hitch is that in order to run the commonmark tests, we need to generate HTML. If we moved the lucid stuff to a separate package, we'd have to duplicate it in commonmark-core for the test suite, or move most of the testing to commonmark-lucid.)

Why not just lift the test-suite to the top of the project and have it depend on commonmark-lucid? Haddock itself has several of these sorts of top-level test suites (haddock-test, hoogle-test, html-test, hypsrc-test, latex-test).

If I have some time later I might try this out myself.

Eliminating the tagsoup dependency is a bit harder, since currently entity resolution is part of the commonmark spec, but that might change. In the mean time, we could just include a module with an entity lookup table, instead of depending on tagsoup.

That sounds reasonable.

harpocrates avatar Apr 25 '18 17:04 harpocrates

Why not just lift the test-suite to the top of the project and have it depend on commonmark-lucid? Haddock itself has several of these sorts of top-level test suites (haddock-test, hoogle-test, html-test, hypsrc-test, latex-test).

This would be a reasonable approach, I guess. Is the thought, then, that there'd be a commonmark package that exports modules from commonmark-core and commonmark-lucid and also includes the tests?

jgm avatar Apr 25 '18 20:04 jgm

Yep, that is what I had in mind.

harpocrates avatar Apr 26 '18 00:04 harpocrates

After a bit of tinkering I've separated out commonmark-core which now just has

  build-depends:
      base >=4.7 && <5
    , text
    , containers
    , transformers
    , parsec

jgm avatar Apr 28 '18 06:04 jgm

@jgm I really appreciate all the work you've done to make sure Haddock can depend on this markdown parser - I think that dependency list is as good as it can get! I've already started making the corresponding Haddock changes. The path forward I intend to follow is:

  • [x] Move Haddock off of vendored attoparsec and onto text and parsec (to keep parser library dependencies minimal)
  • [ ] Update the DocH type to be expressive enough for the stuff Markdown supports (for instance, hyperlinks should support arbitrary inline markup). This means updating all of the backends too.
  • [ ] Add markdown support to Haddock. commonmark-core can be a conditional dependency of haddock-library, off by default. That means we'll be punting on the decision about whether to vendor commonmark-core, how to feature-gate markdown support, etc.

Does this sound like a measured approach to everyone? Am I likely to interfere with any other plans?

harpocrates avatar Apr 28 '18 06:04 harpocrates

Alec Theriault [email protected] writes:

I will probably be making more changes to the commonmark library in the near future (and of course it's not yet published on Hackage). In particular, I'm considering adding a small HTML backend via text (not using lucid) to the core code, so that the test suite can be included there. If I do that, I'd probably rename the core package back to commonmark and remove the existing umbrella package. This is just to say that you shouldn't make assumptions yet about API stability.

jgm avatar Apr 28 '18 16:04 jgm

Note: I've published the commonmark package on hackage. It has the following dependencies: base (>=4.7 && <5), bytestring, containers, parsec, text, transformers. (And semigroups for old ghc verisons.)

jgm avatar Jul 18 '20 18:07 jgm

Would anyone be interested in advising on how to push this over the line? I resurrected the old branch by @harpocrates here: ulysses4ever/haddock@md. It is rebased on ghc-9.2, builds and covers the example given in the OP of this thread.

ulysses4ever avatar Nov 21 '21 22:11 ulysses4ever

@ulysses4ever I bless your efforts, you can open the PR and I'll be one of the people to give you a review. :)

Kleidukos avatar Nov 21 '21 23:11 Kleidukos

@ulysses4ever you'll be thanked for years to come for doing this ;)

throwaway1037 avatar Jan 20 '22 22:01 throwaway1037

Yeah, this has been on my back burner for a while. Hope to get to it this month.

ulysses4ever avatar Jan 21 '22 02:01 ulysses4ever

Yes please!

One of the issues to resolve would be what to do with the commonmark dependency. All of Haddock's other dependencies get are libs that ship with GHC, so would commonmark get added to those? It is worth discussing with GHC devs directly on whether that's acceptable and what the process looks like. If that's not possible, commonmark dependency could be behind a flag (but that significantly hampers adoption - most people get Haddock indirectly from their GHC installation, they don't install it directly).

harpocrates avatar Jan 21 '22 04:01 harpocrates

Thanks, Alec, for chiming in (I hoped you would). Indeed, I read (and re-read) the long discussions about dependency footprint here and on related issues/PRs. I have a very good idea of how hesitant GHC devs are about adding any new dependency, so, I'm pretty sure that a flag is the only viable option at the moment. I think whoever really wants to buy in MD for their package, won't be discouraged by installing haddock via cabal.

ulysses4ever avatar Jan 21 '22 20:01 ulysses4ever

@ulysses4ever https://hackage.haskell.org/package/commonmark-0.2.1.1/dependencies does not look too bad. The transitive closure adds only three boot packages: unicode-data, unicode-transforms and commonmark itself. I can vouch for responsible maintenance of the first two, and I believe no one can blame @jgm for being unresponsive. These all are plain native Haskell packages, no cbits or system dependencies or TemplateHaskell.

I think Markdown support in Haddock is a must. @Kleidukos we can really use your opinion here :)

Bodigrim avatar Jan 21 '22 21:01 Bodigrim

@ulysses4ever please let me know if there is anything I can do to help. I still feel guilt about never seeing this through...

I think getting markdown support behind a flag would be a good way to get started. I'm optimistic that this flag could eventually be toggled on for haddock-ghc, especially if the feature proves popular. In the meantime, a user trying to build docs for a project using markdown markup would ideally get an error that would point them to how to manually install Haddock w/ markdown enabled (and how to get cabal/stack to pick up the right Haddock binary).

harpocrates avatar Jan 21 '22 21:01 harpocrates

Hi, apologies for the long silence.

I have a very good idea of how hesitant GHC devs about adding any new dependency

@ulysses4ever Yup', that's a no-go zone for us.

I'm optimistic that this flag could eventually be toggled on for haddock-ghc

@harpocrates We can even one day have it enabled by default and disable it on GHC's side, once the feature is out of experimental.

I think Markdown support in Haddock is a must. @Kleidukos we can really use your opinion here :)

@bodigrim This has my full support. I'm not sure I can contribute much on the code but I can definitely give a hand to deal with the logistics & relations with GHC.

Kleidukos avatar Jan 21 '22 21:01 Kleidukos

In the meantime, a user trying to build docs for a project using markdown markup would ideally get an error that would point them to how to manually install Haddock w/ markdown enabled (and how to get cabal/stack to pick up the right Haddock binary).

Many projects build haddocks as a part of their CI, which uses stock haddock executable, and there is no time to rebuild it from the scratch with a different flag each time. Hackage server is also most likely to use a stock haddock, and overriding this is likely to be error-prone. So I struggle to see how such flag is supposed to bring any practical adoption: most likely no one would bother.

@Kleidukos could you check with GHC devs, how would they feel about haddock depending on commonmark (and transitively on unicode-transforms and unicode-data)? You can point them to my analysis in https://github.com/haskell/haddock/issues/794#issuecomment-1018884773

Bodigrim avatar Jan 21 '22 22:01 Bodigrim

While I'm never eager to add boot dependencies, in this case it's clear that there are few alternatives. Moreover, unicode-transforms, and unicode-data in particular are quite unobjectionable. Assuming they all keep their historically very good maintenance record, I don't foresee this change being a problem for GHC.

bgamari avatar Jan 21 '22 22:01 bgamari