sile icon indicating copy to clipboard operation
sile copied to clipboard

Native markdown overhaul

Open Omikhleia opened this issue 1 year ago • 13 comments

Closes #1336

A work-in-progress to at least get the existing "native markdown" route working decently.

First 3 commits are mainly to make it work again...

  1. Upgrade our local copy of lunamark to master as of July. 17, 2022 Rationale: the old version was close to 200 locals in a function, meaning we could hardly modify it... And one bug or two were also fixed apparently)
  2. Hack lunamark so that it can output an AST again Rationale: that master version ~~removed~~ does not have the ast writer (from our earlier fork, see note below) and enforces text output. It's a one-liner to fix, and allows us to work with the AST directly again...
  3. Update our markdown class so it works minimally again... ... and even with a few improvements, e.g.
    • Links (to URL)
    • SVG, hrule, etc.
    • Lists since we now have them
  4. Actually split the markdown class to a markdown package, usable with other classes

Subsequent commits are additional features (Pandoc-like extensions) with corresponding changes in lunamark:

  • Strikethrough ~~deleted~~
  • Div and Span, with SILE then supporting:
    • Underlining [text]{.underline},
    • Small caps [text]{.smallcaps}
    • Language change, inline [text]{lang=fr} or as fenced colon block, ::: {lang=fr} ...)
  • Image attributes ![](image.png){ height=..., width=... } (keys passed through to SILE, so our units work there)
  • Raw inlines and row blocks, {=xxx} after a simple inline code (with backticks) or as infostring on a fenced block code (backticks or tildes), with SILE support for SILE-code and Lua, as {=sile}, {=sile-lua}
  • As an extra goodie, fenced block code can now also use an extended attribute infostring (i.e. besides the usual, say, python, one can use {.python key=value})
  • Subscripts and superscripts, e.g. respectively H~2~O, 2^10^
  • Task lists (from GitHub-Flavored markdown) as first-class citizens
  • Fancy lists (à la Pandoc)
  • Pipe tables
    • From a partial backport of wikito/markdown see https://github.com/jgm/lunamark/issues/31 = it's functional, but the lunamark documentation, appropriate credits and license clearing all need to be addressed.
    • Mapped to my "ptable" package = The branch currently has an "all-in-one" single-commit backport of it, so I could proceed further and test it. This will have to be removed when ptable eventually makes it...

Eventually, when judged stable enough the changes to lunamark shall be proposed to upstream to its repository. UPDATE: PR submitted.

At this points, I am taking a short break. Future tasks ought to be:

  • On the lunamark side:
    • [x] Follow the PR reviews for my proposals = https://github.com/jgm/lunamark/pull/36
    • [ ] Finalize the pipe table backport from wikito/markdown and also propose it to lunamark upstream = https://github.com/jgm/lunamark/pull/39
    • ~~At some point, address at least header attributes (so we can have unnumbered sections etc.)~~
  • On the SILE side:
    • [x] Refactor/split this thing as a package, it shall not be a class (only)...
    • [ ] Wait for needed packages to use them where appropriate:
      • ptable #1344
      • textsubsuper #1475
      • ~~labelrefs~~ #1366 (nope, won't do, see comments)
    • [x] Add a documentation chapter about Markdown in SILE
    • [ ] Drink tea or coffee

Omikhleia avatar Jul 16 '22 23:07 Omikhleia

2. Hack lunamark so that it can output an AST again

From a real quick glance, it looks like the function we're overriding to hack this in is a publicly exposed one. Can we use lunamark as a regular dependency and just patch that function internally instead of having a vendored copy again?

alerque avatar Jul 16 '22 23:07 alerque

Can we use lunamark as a regular dependency and just patch that function internally instead of having a vendored copy again?

We could. I'd want to propose the fixes and improvements I am preparing to lunamark, though, eventually, and see how the owner reacts. Another possibility would be to maintain our own fork of it (outside), if the owner is not interested.

Omikhleia avatar Jul 16 '22 23:07 Omikhleia

Can we use lunamark as a regular dependency and just patch that function internally instead of having a vendored copy again?

We could. I'd want to propose the fixes and improvements I am preparing to lunamark, though, eventually, and see how the owner reacts. Another possibility would be to maintain our own fork of it (outside), if the owner is not interested.

If we can get away with just monkey patching a function after we load the upstream library lets start with that. Of course proposing the fixes upstream is fine too, but until such a time as they are accepted and released we still need a deploy path that works for us. Forking is a potential if the upstream was really unmaintained, but as long as we can't reasonably be the canonical fork, using a fork is problematic for packaging and distribution. At that point we would have to vendor it like it is now.

alerque avatar Jul 17 '22 06:07 alerque

Actually I should have read the comments more thoroughfully, our markdown class stated in its initial comment

You will need my lunamark fork from https://github.com/simoncozens/lunamark for the AST writer.

So we already had kind of a fork. Oh well, let's proceed anyway, and sort the mess later :cactus: (but this note is added as a reminder)

Omikhleia avatar Jul 17 '22 12:07 Omikhleia

Description updated, with a task list for subsequent steps. I'm having a break on this, until things move on both the lunamark and SILE sides.

Omikhleia avatar Jul 18 '22 20:07 Omikhleia

Lol ?

image

EDIT: So for some reason I would have to investigate, the broken markdown class now conflicts.

Am I enough annoyed?

Omikhleia avatar Jul 19 '22 23:07 Omikhleia

I'll fix the merge conflicts, don't worry about those.

alerque avatar Jul 19 '22 23:07 alerque

I just handled the merge conflicts for you. Note that since this brings in the new modular inputter setup in from master, the merge splits up what you had all in the markdown class into two places: a markdown inputter and the markdown class.

As you started to setup, using the markdown reader also sets the document class to be markdown by default. We could work on exactly how that works, but for now it seemed like a good place to start.

You can call this on a markdown file by requesting the inputer be loaded at run time:

$ sile -r inputter.markdown myfile.md

The output should go straight to PDF as usual.

Don't worry too much about the exact details of your commits if you keep working on this. I see some of these need some refactoring or are even temporary, but I don't mind doing the rebase work to get them cleaned up if needed.

alerque avatar Jul 20 '22 00:07 alerque

I handled the merge conflicts locally, but for some reason I can't push them here. I don't think you enabled the "maintainers can push to this branch" option on this PR, perhaps you'd like to. In case not I stuck a copy of the branch with my merges on the end here and you could pull in my changes from there to your local branch.

alerque avatar Jul 20 '22 00:07 alerque

I don't think you enabled the "maintainers can push to this branch"

I actually just disabled it on all my PRs ^^

As of this one (a draft), the mardown class will probably have to go away at some point, in favor of a package that may be used in any class (not necessarily deriving from the current book class). I am not yet there, though.

Omikhleia avatar Jul 20 '22 00:07 Omikhleia

I actually just disabled it on all my PRs

However you like. The merge is pretty involved though so I do suggest you at least merge my PR and then move on from there.

the mardown class will probably have to go away at some point, in favor of a package that may be used in any class

That sounds like a reasonable refactor, and roughly what I already did with the pandoc package that handles SIL content generated by Pandoc itself (including span attributes, div blocks with classes, etc.) I've been using that in production a long time. I'm actually kind of curious what part of that output wasn't sufficient for your original use case, but it's not a big deal. I don't have an issue seeing more than one working solution.

One caveat: we'll probably want a class anyway to trigger the right package(s) to load, but it can be pretty slim and many of the content processing functions can move to a package.

I think it would also be possible to setup a class with a classoption that determined the parent: i.e. we could have a markdown class that optionally inherited from plain or book depending on a value passed in with -O. I could look into that if you think it would be useful.

alerque avatar Jul 20 '22 00:07 alerque

N.B.

  • Started to squash some commits (notably those on our lunamark vendored copy, now that this is handled elsewhere too)
  • Ditched one of the subsequent steps (header attributes, labelrefs-based cross-references). Not mandatory to get something better than we previously had, and this is too problematic (needing book class support), so I am not going to address it here.

Omikhleia avatar Jul 23 '22 12:07 Omikhleia

Update:

  • Rebased on current master (b339e27ab71dcba57e275e6ec8b8daa799324f36) and updated to 0.14 API
  • Inputter and packages split
  • Overall:
    • Test fails (at least) because of the quick "ptable" backport and similar hiccups, so I haven't bothered yet.
    • Still, it runs smoothly, even via CLI (e.g. SILEX -u inputters.markdown -u packages.autodoc documentation/c13-markdown.md)

In the meantime, lunamark has moved merging PR 36, so work can resume after my return from vacations.

Omikhleia avatar Aug 06 '22 18:08 Omikhleia

Update:

  • Rebased on current master (58da8ad5b824fa9ccc97d0ddfbea44e3a5c39c8e)
  • Chored lunamark "vendored" version to master as of Aug. 12, 2022 - which now has all the minimum features we need here, yay.

Omikhleia avatar Aug 12 '22 21:08 Omikhleia

Update:

  • The markdown inputter generates a more compact AST + minor robustness fix.
  • Framework for testing the inputter with expectations vs. reconstructed "human-readable" SIL-like reconstruction of the AST.

I think this is getting quite close to completion now (note). Besides more test cases, all we now need is the ptable and textsubsuper packages to be available in master (so as to remove the shims here, rebase the PR orderly and do some proper refactor before review). @alerque What do you think would be a decent time-frame for that to occur?

(note) There are of course still areas that could be extended, but they'd all require more work on lunamark (e.g. link attributes, line blocks) and/or adequate SILE building-blocks (e.g. external formatters).

Omikhleia avatar Aug 13 '22 22:08 Omikhleia

I am refactoring this as an "external" luarocks package.

Omikhleia avatar Aug 17 '22 20:08 Omikhleia

There it goes: https://luarocks.org/modules/Omikhleia/markdown.sile

Omikhleia avatar Aug 20 '22 00:08 Omikhleia