djot icon indicating copy to clipboard operation
djot copied to clipboard

New vs continuing paragraph after block quote or other set-off content

Open jgm opened this issue 2 years ago • 15 comments

Here are two kinds of texts we might want to distinguish:

paragraph content

> block quote

continuation of paragraph

vs

paragraph content

> block quote

new paragraph

A deficiency of Markdown is that there is no way to distinguish these cases. The problem is reduced if one renders in a format that does not indent new paragraphs, because then there is no visual distinction between the cases. But they are semantically different and can be distinguished, e.g., in print output with indented paragraphs. There should be a way to distinguish them in the source.

The problem is not raised only by block quotes but occurs also with set-off equations, images, tables, code, and lists.

I recently found myself creating a pandoc Lua filter that implements the following syntax for the "continued paragraph case":

paragraph content

> block quote

_ continuation of paragraph

(The filter just inserts a LaTeX \noindent command where the _ is.) This is not too bad actually. It would be nice if djot had some way of making the distinction.

jgm avatar Sep 06 '22 19:09 jgm

Why not indentation for an embedded blockquote? That seems the most intuitive to me.

bpj avatar Sep 07 '22 19:09 bpj

The question is not about the syntax of the block quote, but about how to mark what follows it as either a new paragraph or a continuation of the previous one.

jgm avatar Sep 07 '22 21:09 jgm

I mean that if what follows the blockquote is a continuation the blockquote is indented == the blockquote is embedded in a paragraph.

paragraph

    > blockquote inside paragraph

rest of paragraph (continuation)

vs.

first paragraph

> blockquote after paragraph

another paragraph

I hope that is clearer.

bpj avatar Sep 07 '22 22:09 bpj

Yes, got it now.

jgm avatar Sep 07 '22 22:09 jgm

Does jdot's AST support block elements nested within a paragraph?

vassudanagunta avatar Sep 20 '22 12:09 vassudanagunta

We wouldn't need the AST to support block elements as children of a paragraph. It would be sufficient just to be to mark the following content as "not a new paragraph."

jgm avatar Sep 20 '22 17:09 jgm

Thinking about a syntax for, "anyhow, as I was saying", I was going to suggest ..., as in:

The boat ride took us through the everglades.

> It was one of those "airboats" with the giant propeller.

... We saw a lot of birds but no alligators.

But that causes a pretty big indent, and ... already automatically gets you a "…" in djot, and it might cause problems when the author wants an actual ellipses.

The leading underscore is ok, but also does make me think italics.

Since "and" is at least somewhat close to "anyhow, as I was saying", maybe &?

The boat ride took us through the everglades.

> It was one of those "airboats" with the giant propeller.

& We saw a lot of birds but no alligators.

I like that one because,

  • & is not currently used for any other djot syntax,
  • I can read it as "and" ("and as I was saying") and it kinda works. :)
  • the glyph itself also looks somewhat like any other alphabet letter, and so is not as distracting (does not stand out so much on the page) as the _, which I think is a desirable characteristic for this bit of markup.

uvtc avatar Sep 21 '22 02:09 uvtc

@jgm,

We wouldn't need the AST to support block elements as children of a paragraph. It would be sufficient just to be to mark the following content as "not a new paragraph."

I understand. Would you mind answering a related long standing question I've had about terminology?

Is there a distinction between an abstract syntax tree and an intermediate representation? Since djot parses to an AST, and since you are proposing a new djot syntax for paragraph continuation, your suggested approach above makes sense. But if instead you needed to model a general abstraction of structured text, independent of any specific syntax, such as the "AST" at Pandoc's core, then it might be better to represent it as a single paragraph with a nested block quote, yes? And whether or not that is the better representation of this specific case, would you agree that there is nonetheless a difference between an AST and an IR, and that the core data structure of Pandoc is better characterized as an IR?

vassudanagunta avatar Sep 29 '22 13:09 vassudanagunta

It might make more sense conceptually to allow a block quote to be a child of a paragraph. But this would make the interface with Pandoc's types more complicated. I don't know what is best.

About terminology, I'd say that "IR" is the genus and "AST" is one species.

jgm avatar Sep 29 '22 14:09 jgm

About terminology, I'd say that "IR" is the genus and "AST" is one species.

ok, thank you.

RE the bigger question, some things to consider:

  1. I would say that Pandoc's solves the problem of translation between so many different input and output forms by defining an IR that is syntax independent and more or less a semantic superset of those syntaxes. Then the question becomes whether representing paragraphs that span block quotes (or other elements, see below) is a universal or common enough to warrant complicating Pandoc's IR.

  2. An old W3C www-html list discussion: Re: Lists within Paragraphs. An excerpt:

    I think this is part of a bigger problem.  Paragraph's can't contain block
    level elements.  At first this seems to make a lot of sense.  But it
    doesn't work in many instances.
    
    For example often block level mathematical formulas occur in paragraphs.
    If we consider
    
                      x + y = z
    
    as such an example, we see that in this case this paragraph is the still
    the same one, but we have a block level element in it.
    
  3. The HTML spec's ultimate answer admits that paragraphs might logically span block elements, but that it doesn't apply to the HTML standard:

    List elements (in particular, ol and ul elements) cannot be children of p elements. When a sentence contains a bulleted list, therefore, one might wonder how it should be marked up.

    For instance, this fantastic sentence has bullets relating to

    • wizards,
    • faster-than-light travel, and
    • telepathy,

    and is further discussed below.

    The solution is to realize that a paragraph, in HTML terms, is not a logical concept, but a structural one. In the fantastic example above, there are actually five paragraphs as defined by this specification: one before the list, one for each bullet, and one after the list.

    The markup for the above example could therefore be:

    <p>For instance, this fantastic sentence has bullets relating to</p>
    <ul>
    <li>wizards,
    <li>faster-than-light travel, and
    <li>telepathy,
    </ul>
    <p>and is further discussed below.</p>
    

    Authors wishing to conveniently style such "logical" paragraphs consisting of multiple "structural" paragraphs can use the div element instead of the p element.

    Thus for instance the above example could become the following:

    <div>For instance, this fantastic sentence has bullets relating to
    <ul>
    <li>wizards,
    <li>faster-than-light travel, and
    <li>telepathy,
    </ul>
    and is further discussed below.</div>
    

    This example still has five structural paragraphs, but now the author can style just the div instead of having to consider each part of the example separately.

  4. Allowing paragraphs to span/nest block elements provides, I think, a cleaner and more consistent solution to "tight lists". For example, the following would be a tight list because each list item contains exactly a single element (a paragraph):

    - para 1
    - para 2
      - a
      - b
      - c
    - para 3
    

    The current CommonMark solution has flaws, as can be seen by comparing

    - item one
    - item two
      # a heading
      more text
    - item three
    

    with

    - item one
    - item two
    
      a heading
      ---------
      more text
    - item three
    

    Both should be treated as loose lists since the second item in each contains block sequences, but CommonMark's determination is based on the existence or lack thereof of blank lines in the source, not logical structure.

I hope this is helpful. Please let me know if you've had enough! It just happens to be a question I've been trying to tackle myself.

vassudanagunta avatar Sep 30 '22 01:09 vassudanagunta

Just commenting to second @bpj's proposed use of indentation for this.

dsanson avatar Oct 02 '22 15:10 dsanson

Re. @bpj 's suggestion about indenting: would this cause a problem with putting lists between paragraphs? That is, with a list you may (and typically) indent the list marker. Is there a difference between a list that's its own paragraph vs a list that's in the midst of a paragraph?

uvtc avatar Oct 02 '22 18:10 uvtc

Including lists in paragraphs is an important use case for the kind of writing that I do, at least, and neither the suggestion of a leading _ nor the suggestion of indentation work well for this case.

I would need to distinguish between all of the following:

A:

Some text:
\begin{itemize}
\item A
\item B
\end{itemize}
And more text

B:

Some text:

\begin{itemize}
\item A
\item B
\end{itemize}
And more text

C

Some text:
\begin{itemize}
\item A
\item B
\end{itemize}

And more text

D

Some text:

\begin{itemize}
\item A
\item B
\end{itemize}

And more text

Leading underscore works to distinguish A from C. But not to distinguish A from B, nor C from D. It catches part of the A/D distinction.

Indentation doesn't work for any of them.

An alternative design is a convention that there's a div for "multi-paragraphs" that contain multiple block elements. It's ugly but accurate:

::: {.paragraph}
Some text:

* A
* B

And more text
:::

would denote option A.

This would be tool specific, however, but that's perhaps OK - I think the need for this kind of thing tends to arise in long-form scientific writing more than in smaller, casual documents, so having a Googleable solution like this is perhaps OK. This also remains compatible with the various ASTs out there.

david-christiansen avatar Oct 13 '23 04:10 david-christiansen

One could use a single dot on a line as a "connector" that says: the following normally-block-level thing is to be considered as part of the current paragraph. Then your A is

Some text
.
- A
- B
.
more text

and your B is

Some text

- A
- B
.
more text

and so on. Of course, this would require figuring out an AST model that actually permits this sort of thing. And some (most?) output formats just won't allow a list or a block quote to be part of a paragraph: in HTML for example, a p element can only contain "phrasing content."

jgm avatar Oct 13 '23 07:10 jgm

AsciiDoc uses the plus sign (+) as the so called list continuation: https://docs.asciidoctor.org/asciidoc/latest/syntax-quick-reference/#ex-complex

mygithubdevaccount avatar Jan 31 '24 19:01 mygithubdevaccount