libasciidoc icon indicating copy to clipboard operation
libasciidoc copied to clipboard

Support for Asciidoc underlined (two line) titles

Open pjanx opened this issue 7 years ago • 7 comments

First of all, thank you for taking it upon yourself to create an independent implementation with more of a proper parser. I've already learnt how hard it is and stopped pursuing it because of incompetence.

This isn't mentioned in LIMITATIONS.adoc but the following syntax, which is very valid Asciidoc but deprecated Asciidoctor, is not supported:

Title of document
=================

Heading
-------
Subheading
~~~~~~~~~~
Subsubheading
^^^^^^^^^^^^^
Subsubsubheading
++++++++++++++++

Asciidoc requires the length to be within +/- 2 characters, Asciidoctor allows only +/- 1 character.

pjanx avatar Sep 24 '18 04:09 pjanx

Thanks for the feedback, @pjanx ;) The plan for now was to support headings using the = prefix character(s), but I guess I could also support this variant in the future, too.

xcoulon avatar Sep 24 '18 07:09 xcoulon

In the meantime, I have written a preprocessor that hacks this support in, and also handles PEG parse failures. It is intended to be used in Gogs, Gitea and such.

package main

import (
	"bytes"
	"context"
	"encoding/xml"
	"io"
	"io/ioutil"
	"os"
	"strings"
	"unicode"
	"unicode/utf8"

	"github.com/bytesparadise/libasciidoc"
	"github.com/bytesparadise/libasciidoc/pkg/renderer"
)

// isTitle returns the title level if the lines seem to form a title,
// zero otherwise. Input lines may inclide trailing newlines.
func isTitle(line1, line2 []byte) int {
	// This is a very naïve method, we should target graphemes (thus at least
	// NFC normalize the lines first) and account for wide characters.
	diff := utf8.RuneCount(line1) - utf8.RuneCount(line2)
	if len(line2) < 2 || diff < -1 || diff > 1 {
		return 0
	}

	// "Don't be fooled by back-to-back delimited blocks."
	// Still gets fooled by other things, though.
	if bytes.IndexFunc(line1, func(r rune) bool {
		return unicode.IsLetter(r) || unicode.IsNumber(r)
	}) < 0 {
		return 0
	}

	// The underline must be homogenous.
	for _, r := range bytes.TrimRight(line2, "\r\n") {
		if r != line2[0] {
			return 0
		}
	}
	return 1 + strings.IndexByte("=-~^+", line2[0])
}

func writeLine(w *io.PipeWriter, cur, next []byte) []byte {
	if level := isTitle(cur, next); level > 0 {
		w.Write(append(bytes.Repeat([]byte{'='}, level), ' '))
		next = nil
	}
	w.Write(cur)
	return next
}

// ConvertTitles converts AsciiDoc two-line (underlined) titles to single-line.
func ConvertTitles(w *io.PipeWriter, input []byte) {
	var last []byte
	for _, cur := range bytes.SplitAfter(input, []byte{'\n'}) {
		last = writeLine(w, last, cur)
	}
	writeLine(w, last, nil)
}

func main() {
	input, err := ioutil.ReadAll(os.Stdin)
	if err != nil {
		panic(err)
	}

	pr, pw := io.Pipe()
	go func() {
		defer pw.Close()
		ConvertTitles(pw, input)
	}()

	// io.Copy(os.Stdout, pr)
	// return

	_, err = libasciidoc.ConvertToHTML(context.Background(), pr, os.Stdout,
		renderer.IncludeHeaderFooter(true))
	if err == nil {
		return
	}

	// Fallback: output all the text sanitized for direct inclusion.
	os.Stdout.WriteString("<pre>")
	for _, line := range bytes.Split(input, []byte{'\n'}) {
		xml.EscapeText(os.Stdout, line)
		os.Stdout.WriteString("\n")
	}
	os.Stdout.WriteString("</pre>")
}

pjanx avatar Oct 06 '18 22:10 pjanx

hello @pjanx. Back on this issue: I'm not sure how to deal with this request in the grammar yet, but out of curiosity and depending on your workflow, could you use that syntax that is already supported by libasciidoc instead ? (I'm referring to the = , == , etc. prefix on the section title).

And good point for the lack of mention in the LIMITATIONS.adoc file, I'll add that, too. (well, until I find a way to resolve it)

xcoulon avatar Oct 15 '18 08:10 xcoulon

Hi. I've already written that preprocessor above, that works for me. I just wanted to have nice READMEs again, without Ruby or Python on the machine, since I moved my repositories off of GitHub. That has mostly been achieved now, except for the other LIMITATIONS. I enjoy the two line syntax.

pjanx avatar Oct 15 '18 15:10 pjanx

ok, thanks for your feedback, @pjanx. For now, my main concern is to be able to parse the "subline" that must have the same length as the title (with one or 2 chars of diff). So I'll keep this issue in the backlog for now until I have a good solution, if that's ok for you ;)

xcoulon avatar Oct 16 '18 08:10 xcoulon

My advice is to not support two-line section titles (setext headings). If/when AsciiDoc gets a spec, this will very likely be dropped. The main reason is that the symbols don't give any indication of the nesting level, so even someone experienced with AsciiDoc like myself can never remember what levels they represent. More important, they conflict with delimited blocks in AsciiDoc, so it makes the language harder / more ambiguous to parse, both for tools and humans.

My advice is to stick with atx headings.

mojavelinux avatar Oct 26 '18 06:10 mojavelinux

thanks for the feedback, @mojavelinux and happy to see you here ;) Yes, my first concern for not supporting two-line section titles was the parsing, but I also agree with you that the symbol does not easily reflect the section level.

xcoulon avatar Oct 26 '18 07:10 xcoulon