formal-ledger-specifications
formal-ledger-specifications copied to clipboard
Literate Agda using markdown
Opening this issue as I would like to discuss whether there is a chance to create the formal-ledger-specifications as a set of literate Agda documents using markdown instead of LaTeX.
Why
As an occasional reader of the formal-ledger-specfications I want to be able to contribute to it by fixing typos, fixing links or even suggesting "prose" and explanations to make certain dense concepts more easier to understand to readers. i.e. this request is not about the Agda code itself.
While I know how to write LaTeX, I prefer using less distracting formats like markdown or restructured text. As I post this issue here in Github, I am typing it in (github flavored) markdown and hence this feels most familiar to me as a software engineer.
Furthermore, as a "casual contributor" I might be even reading the specification and when finding a small issue like a typo, I would not have the repo checked out, but prefer to "edit in github" (there are links like this often in rendered documents to invite collaborators).
What
Should we use Literate Markdown to write the Ledger specs?
Example
I really like how the Programming Language Foundations in Agda book is presented and when I briefly worked on it in a workshop it was very easy to contribute to as it is just Markdown + Agda code blocks. The pull request even rendered a nice "rich" diff: https://github.com/plfa/plfa.github.io/pull/878/files#diff-2a2528d7e68ace2c2a17503f240dc0933d52fadf03cdab452b34a513bf6ebfca.
With literate markdown, Github helps me to access and re-orient myself in the corresponding source file. Compare this section from plfa:
https://github.com/plfa/plfa.github.io/blob/dev/src/plfa/part1/Induction.lagda.md
Screenshot plfa
Versus this section from this repo:
https://github.com/IntersectMBO/formal-ledger-specifications/blob/master/src/Ledger/GovernanceActions.lagda
Screenshot formal-ledger-specifications
@WhatisRT what do you think? Also, can you maybe mark this as "discussion" I'm missing permissions.
Probably out of scope of this right away, but plfa also does a GREAT job in getting people started on setting up tools and guiding them to contribute (both are just markdown files in their repo too).
I could go either way on this. Both approaches have benefits and drawbacks, and to a large degree the original decision to go with LaTeX was just based on precedent. Here are some tradeoffs:
- Pro LaTeX:
- A rendered PDF is a somewhat stable artefact which is quite useful for things like audits. It also feels more formal (at least to me).
- It is more powerful in what it can display, e.g. the vertical vectors that we have. I think that in particular really helps the structure.
- It offers lots of features, like a bibliography, formatting options, etc., and it's programmable. For example we generate both the Conway and the full spec documents from mostly the same source.
- Pro Markdown:
- A lot easier to write, and the source code would look a lot less cluttered.
- Easier to contribute.
- Links just work.
As a consequence of LaTeX giving us more options to display things, the document can be further removed from technical Agda issues we might be having. For a recent example, we had some instance resolution issues here and really the only solution is to replace a function by a type. With a bit of creative post-processing, we can just put a 'newtype' wrapper around our function and then just disappear the wrapper when rendering. So for documentation purposes we basically just pretend the original code works, even if it doesn't for technical reasons. I'm not so sure if we can do the same level of post-processing with Markdown (and it would probably destroy links?), so we might have to actually replace the function with a relation in the visible part of the documentation in this case. There are probably tens of cases where we would have a different visible code if we couldn't hide our dirty little secrets so well.
Also, we currently have this generated HTML that we link from the README, which is just generated from the same source as everything else. It's a big mess because it has all the LaTeX code in it, but at least it has working links for everything. I guess we could have some kind of parallel documentation in Markdown that replaces this if we wanted to.
One middle-ground that I'd highly recommend exploring is Typst. It's essentially markdown, with a rust-like embedded DSL for rendering / layout / computation. It feels like a modernized LaTeX, is an absolute delight to work with, and has all of the pro's listed above.
We use this to typeset our audits. For an example, here's one of the audit reports we've made public:
https://github.com/SundaeSwap-finance/sundae-audits-public/blob/main/butane-sale/butane-sale.typ
and here's the rendered result:
https://cdn.sundaeswap.finance/audits/butane-sale.pdf
It also looks like it's already supported by Literate Agda:
https://agda.readthedocs.io/en/nightly/tools/literate-programming.html#literate-markdown-and-typst
@Quantumplation while I am intrigued by Typst myself and could be an improvement, it does not address my issue in full. Particularly "drive-by contributions" are similarly unlikely as with LaTeX or other niche markup languages like org-mode (which I personally use a lot, but certainly would not expect that from a wide audience).
@WhatisRT Thanks for outlining the pros of the status quo and how Markdown could improve the situation - I agree with your benefits listed. Can we make the pros for LaTex (= cons for markdown) more concrete? I'm not convinced that they are even an issue with the source markup used.
A rendered PDF is a somewhat stable artefact
It's not my intention to deny this (within this issue) and I'm quite sure PDF output is possible, but could require some work (quoting a colleague of ours "pandoc exists!!") If we look at plfa again, they have an EPUB export but also an open issue about PDF. I'd be happy to investigate and see whether I can solve it for them to unblock us here?
It is more powerful in what it can display, e.g. the vertical vectors that we have
I'm sorry, but which vectors? After searching through the repository I found some heave post-processing related to the term "vector", which seems to be related to AgdaInductiveConstructor{⟦}, but none of the post-processed files contains those?
It offers lots of features, like a bibliography, formatting options
Any specific formatting you would be missing from markdown? I don't see a bibliography used in the current ledger specs pdf, but as we would most likely use pandoc (as recommended by the agda docs) to produce a PDF with a bibliography .. citations and .bib files are supported.
and it's programmable. For example we generate both the Conway and the full spec documents from mostly the same source.
Can you point me to something concrete for this? It sounds like we want to aggregate the same source files into two or more outputs. I'm not sure why you would assume this is not possible with .lagda.md files but possible with .lagda (tex) files?
For a recent example, we had some instance resolution issues https://github.com/IntersectMBO/formal-ledger-specifications/pull/649#discussion_r1928688863 and really the only solution is to replace a function by a type.
I could not follow this example of yours. Is this maybe captured by having code rendered but ignored by agda (or the other way around)? The agda docs mention that this is possible and I saw it used on plfa too.
Also, we currently have this generated HTML that we link from the README, which is just generated from the same source as everything else. It's a big mess because it has all the LaTeX code in it, but at least it has working links for everything. I guess we could have some kind of parallel documentation in Markdown that replaces this if we wanted to.
This HTML? You are right, it's a mess and the tex code makes it hard to navigate. I do understand that for plfa a similar method for producing HTML output was used.. just that the actual book is this HTML rendering. For example this chapter which even contains links to other places in the book or the agda standard library (click on stuff in the import list!).
Ironically, the discussion in #677 yielded another Pro for Markdown:
- People naturally gravitate toward plain text formats
The contribution of @Quantumplation in the other issue is a perfect example of that. If we would be using markdown, we could really just copy that into our sources or he might have been even inclined to open a PR onto that .lagda.md file instead right away!?
Sorry for the long comment, but I thought writing things down is most helpful. I'd be happy to also chat more informally about this soon!
Note we've had good success rendering literate markdown Agda code in the Peras project: https://peras.cardano-scaling.org/agda_html/Peras.SmallStep.html My CSS is pretty lame so I did not put much efforts but it would be pretty straightforward to have really nice documents integrated in any larger document set, generated from such literate Agda.
And there's always pandoc to produce PDF should we need those.
Plus the HTML has hyperlinks so one can jump around to definitions of things very easily
@Quantumplation while I am intrigued by Typst myself and could be an improvement, it does not address my issue in full. Particularly "drive-by contributions" are similarly unlikely as with LaTeX or other niche markup languages like
org-mode(which I personally use a lot, but certainly would not expect that from a wide audience).
I think the point I was making is that Typst does increase the likelihood of "drive-by-contributions", because the bulk of the document is just markdown. Someone can write a whole section in markdown and it will render as they would expect. And then, as it gets refined/edited, or if they're familiar with it, you can make that content richer by leveraging the Typst to typeset formulas, render diagrams, or even run computation a la literate agda.
(that being said, I'm not trying to push hard for Typst, just wanted to share it as an option that I think meets your goal, while also giving contributors the flexibility to typeset to their hearts content)
@WhatisRT Thanks for outlining the pros of the status quo and how Markdown could improve the situation - I agree with your benefits listed. Can we make the pros for LaTex (= cons for markdown) more concrete? I'm not convinced that they are even an issue with the source markup used.
A rendered PDF is a somewhat stable artefact
It's not my intention to deny this (within this issue) and I'm quite sure PDF output is possible, but could require some work (quoting a colleague of ours "pandoc exists!!") If we look at plfa again, they have an EPUB export but also an open issue about PDF. I'd be happy to investigate and see whether I can solve it for them to unblock us here?
Yeah, maybe we can also get this from markdown. If you want to investigate this go ahead! It might be quite useful either way.
It is more powerful in what it can display, e.g. the vertical vectors that we have
I'm sorry, but which vectors? After searching through the repository I found some heave post-processing related to the term "vector", which seems to be related to
AgdaInductiveConstructor{⟦}, but none of the post-processed files contains those?
See for example this screenshot which I had lying around:
It's a huge record update, and it'd be a lot less readable in any other form I can think of. We use this a lot in the STS rules as well.
It offers lots of features, like a bibliography, formatting options
Any specific formatting you would be missing from markdown? I don't see a bibliography used in the current ledger specs pdf, but as we would most likely use pandoc (as recommended by the agda docs) to produce a PDF with a bibliography .. citations and
.bibfiles are supported.
Nothing other than the vertical vectors comes to mind right now, but I'm sure there are a couple of things you'd bump your head into when moving from one to the other. Both PDF specs have a bibliography, it's just not visible in the TOC which it probably should. It's right before the appendices.
and it's programmable. For example we generate both the Conway and the full spec documents from mostly the same source.
Can you point me to something concrete for this? It sounds like we want to aggregate the same source files into two or more outputs. I'm not sure why you would assume this is not possible with
.lagda.mdfiles but possible with.lagda(tex) files?
We have Conway and NoConway environments that sometimes appear in the LaTeX code. Things in the Conway environments are only visible in the Conway spec and NoConway only outside of it. So we also compile exactly the same files to different outputs. Maybe that's also somehow doable with Markdown, but I wouldn't know how.
For a recent example, we had some instance resolution issues #649 (comment) and really the only solution is to replace a function by a type.
I could not follow this example of yours. Is this maybe captured by having code rendered but ignored by agda (or the other way around)? The agda docs mention that this is possible and I saw it used on plfa too.
It's not just about hiding code fragments, it's about somewhat arbitrarily tweaking code slightly when rendering it. In Agda, you want for example filter to take a Set/Type-valued predicate instead of a function A → Bool. But to actually filter something you need a decision procedure for that predicate, so you use typeclass inference to obtain that decision procedure (note: that's not what the stdlib does, this is just an example). However, sometimes that fails for really stupid reasons. So now you have to manually give the instance: filter ⦃ myPred? ⦄ myPred myList. You can't just hide ⦃ myFun? ⦄ because hiding only works on a line-by-line basis. What you can do instead in define a predicate myPred' that's equivalent to myPred and has working inference, which can always be done with a newtype wrapper. But you don't want to expose that technicality, you still want the rendering to show filter myPred myList and just pretend that that code actually works, which is done by putting a post-processing rule that says to substitute myPred' with myPred. There are also other possible approaches, but I think generally we want some kind of arbitrary post-processing for stuff like that.
Also, we currently have this generated HTML that we link from the README, which is just generated from the same source as everything else. It's a big mess because it has all the LaTeX code in it, but at least it has working links for everything. I guess we could have some kind of parallel documentation in Markdown that replaces this if we wanted to.
This HTML? You are right, it's a mess and the tex code makes it hard to navigate. I do understand that for plfa a similar method for producing HTML output was used.. just that the actual book is this HTML rendering. For example this chapter which even contains links to other places in the book or the agda standard library (click on stuff in the import list!).
Yeah, that's it. We also generate HTML from more sane sources in our dependencies, so I'm familiar with the method. It's just that we went with the PDF and the HTML is just a minor thing that I set up once in an hour and is only rarely used for communication purposes.
Ironically, the discussion in #677 yielded another Pro for Markdown:
- People naturally gravitate toward plain text formats
The contribution of @Quantumplation in the other issue is a perfect example of that. If we would be using markdown, we could really just copy that into our sources or he might have been even inclined to open a PR onto that
.lagda.mdfile instead right away!?
Yeah, that's probably right. I guess it's not another pro since it falls under 'easier to contribute' but it's definitely a great example of that.
I think we might as well consider Typst, it seems like a middle-of-the-road approach. Though I think it could also be the case that it just turns out to not have the pros of either side, being less approachable than Markdown while also being less powerful than LaTeX.
Maybe it would be a good idea to make a document that shows the strengths of each approach? We have this super old document that I produced as a showcase that literate Agda would work: https://github.com/IntersectMBO/formal-ledger-specifications/blob/ffcbb7d9cd29f8acf02cf4e27557a19a5b7fe2af/Ledger.pdf Maybe this one is too easy because it doesn't have all the complexity that we now somewhat require, but it might be a good idea to do a more sophisticated variant of that as a demo.
I'd also like to point to what the 1lab is doing. See for example https://1lab.dev/Algebra.Monoid.html#monoids. If you hover over any defined name, its type signature will pop up. Also, at the top left there is a button for 'Controls', one of which is to view hidden code. You can also immediately jump to the source of any document. This is all done using some kind of markdown dialect.
I'd also like to point to what the 1lab is doing.
This looks awesome. The fact that its searchable, we can link to it and the .lagda.md source links are helpful. This ticks many boxes of what I was looking for to lower the entry barrier on contributions.
Here is a tentative plan on how we would migrate the current setup to markdown:
- Translate the
.lagdafiles from LaTeX to markdown. - Develop the infrastructure that allows us to generate something from the markdown, in particular webpages and pdfs.
- Decide where to deploy
-
Regarding 1: For this we can use Pandoc. However, it seems that we would have to tweak the process using some kind of preprocessing because:
- Pandoc does not understand the difference between hidden and unhidden code blocks. We would like to keep this.
- Pandoc outputs different objects depending on whether code blocks occur inside of figures. Outside figures code blocks are translated to code blocks, inside figures they are translated to divs.
- Pandoc does not understand user defined LaTeX macros, so we would need to adjust this so as not to lose content.
However, this alone is not enough. The pdfs use figures extensively to display code. Although this is very natural in LaTeX, probably in a markdown web-based setting it would be better to inline the code with the explanations. This requires manual intervention (or just convert figures to something else)
This would be time consuming but a one-time investment.
-
Regarding 2: As a reference we can see what
1labdoes, which is complex. However, we can start with something simpler, like MkDocs or the like.
@carlostome Sounds good to me!
My (biased) opinion for 2 and 3 in your list would be to just use https://github.com/cardano-scaling/cardano-blueprint, i.e. produce a document tree that we can publish here: https://cardano-scaling.github.io/cardano-blueprint/ledger/index.html
For Leios, we used a combination of agda's ability to generate static html files from *.agda and *.lagda files and
wrote a small library that extends that with a modern look and features like theme, full-text search, line numbering, highlighting & linking and type preview on hover.
Available here: https://leios.cardano-scaling.org/formal-spec/Leios.Protocol.html
Used this npm package in combination with our existing docusaurus: https://www.npmjs.com/package/agda-web-docs-lib
Thanks @will-break-it. We are using mkdocs and we have it all working nicely now, using Agda to generate highlighted html with links. The hard part was not getting the site working on literate markdown files. Most of the work involved the pipeline that translates latex-based literate Agda to markdown-based literate Agda, handling latex figures, captions, etc, and handling some "Conway"/"NoConway" environments.
I think our site has a pretty a nice theme; it has full-text search, highlighting and linking. We haven't tried to get line numbering, but I suspect that will be very easy. We also haven't tried to configure type-preview-on-hover, but that would be nice to have.