libasciidoc
libasciidoc copied to clipboard
Support for Asciidoc underlined (two line) titles
First of all, thank you for taking it upon yourself to create an independent implementation with more of a proper parser. I've already learnt how hard it is and stopped pursuing it because of incompetence.
This isn't mentioned in LIMITATIONS.adoc but the following syntax, which is very valid Asciidoc but deprecated Asciidoctor, is not supported:
Title of document
=================
Heading
-------
Subheading
~~~~~~~~~~
Subsubheading
^^^^^^^^^^^^^
Subsubsubheading
++++++++++++++++
Asciidoc requires the length to be within +/- 2 characters, Asciidoctor allows only +/- 1 character.
Thanks for the feedback, @pjanx ;) The plan for now was to support headings using the = prefix character(s), but I guess I could also support this variant in the future, too.
In the meantime, I have written a preprocessor that hacks this support in, and also handles PEG parse failures. It is intended to be used in Gogs, Gitea and such.
package main
import (
"bytes"
"context"
"encoding/xml"
"io"
"io/ioutil"
"os"
"strings"
"unicode"
"unicode/utf8"
"github.com/bytesparadise/libasciidoc"
"github.com/bytesparadise/libasciidoc/pkg/renderer"
)
// isTitle returns the title level if the lines seem to form a title,
// zero otherwise. Input lines may inclide trailing newlines.
func isTitle(line1, line2 []byte) int {
// This is a very naïve method, we should target graphemes (thus at least
// NFC normalize the lines first) and account for wide characters.
diff := utf8.RuneCount(line1) - utf8.RuneCount(line2)
if len(line2) < 2 || diff < -1 || diff > 1 {
return 0
}
// "Don't be fooled by back-to-back delimited blocks."
// Still gets fooled by other things, though.
if bytes.IndexFunc(line1, func(r rune) bool {
return unicode.IsLetter(r) || unicode.IsNumber(r)
}) < 0 {
return 0
}
// The underline must be homogenous.
for _, r := range bytes.TrimRight(line2, "\r\n") {
if r != line2[0] {
return 0
}
}
return 1 + strings.IndexByte("=-~^+", line2[0])
}
func writeLine(w *io.PipeWriter, cur, next []byte) []byte {
if level := isTitle(cur, next); level > 0 {
w.Write(append(bytes.Repeat([]byte{'='}, level), ' '))
next = nil
}
w.Write(cur)
return next
}
// ConvertTitles converts AsciiDoc two-line (underlined) titles to single-line.
func ConvertTitles(w *io.PipeWriter, input []byte) {
var last []byte
for _, cur := range bytes.SplitAfter(input, []byte{'\n'}) {
last = writeLine(w, last, cur)
}
writeLine(w, last, nil)
}
func main() {
input, err := ioutil.ReadAll(os.Stdin)
if err != nil {
panic(err)
}
pr, pw := io.Pipe()
go func() {
defer pw.Close()
ConvertTitles(pw, input)
}()
// io.Copy(os.Stdout, pr)
// return
_, err = libasciidoc.ConvertToHTML(context.Background(), pr, os.Stdout,
renderer.IncludeHeaderFooter(true))
if err == nil {
return
}
// Fallback: output all the text sanitized for direct inclusion.
os.Stdout.WriteString("<pre>")
for _, line := range bytes.Split(input, []byte{'\n'}) {
xml.EscapeText(os.Stdout, line)
os.Stdout.WriteString("\n")
}
os.Stdout.WriteString("</pre>")
}
hello @pjanx. Back on this issue: I'm not sure how to deal with this request in the grammar yet, but out of curiosity and depending on your workflow, could you use that syntax that is already supported by libasciidoc instead ? (I'm referring to the = , == , etc. prefix on the section title).
And good point for the lack of mention in the LIMITATIONS.adoc file, I'll add that, too. (well, until I find a way to resolve it)
Hi. I've already written that preprocessor above, that works for me. I just wanted to have nice READMEs again, without Ruby or Python on the machine, since I moved my repositories off of GitHub. That has mostly been achieved now, except for the other LIMITATIONS. I enjoy the two line syntax.
ok, thanks for your feedback, @pjanx. For now, my main concern is to be able to parse the "subline" that must have the same length as the title (with one or 2 chars of diff). So I'll keep this issue in the backlog for now until I have a good solution, if that's ok for you ;)
My advice is to not support two-line section titles (setext headings). If/when AsciiDoc gets a spec, this will very likely be dropped. The main reason is that the symbols don't give any indication of the nesting level, so even someone experienced with AsciiDoc like myself can never remember what levels they represent. More important, they conflict with delimited blocks in AsciiDoc, so it makes the language harder / more ambiguous to parse, both for tools and humans.
My advice is to stick with atx headings.
thanks for the feedback, @mojavelinux and happy to see you here ;) Yes, my first concern for not supporting two-line section titles was the parsing, but I also agree with you that the symbol does not easily reflect the section level.