pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Fenced code block as a list item

Open johnmapeson opened this issue 1 year ago • 5 comments

A bug report or maybe a request for improvement.

Sometimes it is necessary to have a fenced code block as a list item. As I have discovered, the proper syntax for this is not very intuitive.

pandoc input.md -o output.htm
- example 1
- ```
  list item two
  list item two
  ```
- list item three

In Example 1, each line inside the pre block is indented with two spaces, whereas I expected the lines won't be indented.

-   example 2
-   ```
    list item two

    list item two
    ```
-   list item three

Example 2 works fine, but ony if the two lines inside the pre block are separated with the empty one. If there is no empty line between them, the markup will be <li><code>list item two list item two</code></li>.

- example 3
- ```
list item two
list item two
```
- list item three

Example 3 demonstrates the syntax that works fine. Though we can use it, I would prefer the syntax from the Example 1.

johnmapeson avatar Jun 10 '24 02:06 johnmapeson

Hm. I can reproduce this. It's definitely not intended, and you won't get that behavior with -f commonmark or -f gfm.

% pandoc -t native
- example 1
- ```
  list item two
  list item two
  ```
- list item three

[ BulletList
    [ [ Plain [ Str "example" , Space , Str "1" ] ]
    , [ CodeBlock
          ( "" , [] , [] ) "  list item two\n  list item two"
      ]
    , [ Plain
          [ Str "list" , Space , Str "item" , Space , Str "three" ]
      ]
    ]
]

A bug I would say.

jgm avatar Jun 10 '24 05:06 jgm

Even more minimal case:

% pandoc
- ```
  item
  ```
^D
<ul>
<li><pre><code>  item</code></pre></li>
</ul>

jgm avatar Jun 11 '24 02:06 jgm

The problem lies with

listLineCommon :: PandocMonad m => MarkdownParser m Text
listLineCommon = T.concat <$> manyTill
              (  many1Char (satisfy $ \c -> c `notElem` ['\n', '<', '`'])
             <|> fmap snd (withRaw code)
             <|> fmap (renderTags . (:[]) . fst) (htmlTag isCommentTag)
             <|> countChar 1 anyChar
              ) newline

Originally this function was just grabbing the first literal text line of the list item (whose raw contents would be reparsed later). But special handling was added for inline code and HTML comment tags (likely for good reasons which we can look up). Note that ``` can delimit inline code as well as code blocks. So, this function is gobbling the whole thing, instead of just the first line. And because of this, the code that would have removed the extra indentation doesn't get triggered (that's in listLine).

Code is a bit of a mess here -- I need to revisit some things, but I'm recording this diagnosis here for when I have a chance to do that.

Ref #5628

jgm avatar Jun 11 '24 03:06 jgm

The story begins 15 years ago, with this commit: https://github.com/jgm/pandoc/commit/eb2e560d861387414fe03056189f32e54e83851b

That was meant to deal with cases like the following:

- a <!--

- b

-->
- c

That is still a case pandoc handles nicely (whereas commonmark doesn't recognize the HTML comment in this kind of context).

But the cost of dealing with this case was that, in consuming raw content for the list item, we needed to gobble material inside HTML comments. Fine! For many years we did that. But then someone came up with a case like

- a `<!--`
- b `-->`

in which the special characters are quoted in inline code. Well, clearly our "raw line" parser needs to gobble up inline code sections, too. And that's all fine until we have a case like yours. Note that

```
abc
```

would be perfectly valid inline code (were it not parsed first as a code block). So the raw list item parser gobbles up this whole chunk, avoiding the line-by-line reading that strips leading indentation.

What a mess!

In this case we could add an additional band-aid to the current pile of band-aids, probably. But I'm tempted to think that this was all a mistake, and that the way to sanity is the approach we took with commonmark, which just makes it very clear that indicators of block structure take precedence over inline parsing, and render the first example above as

<ul>
<li>
<p>a &lt;!--</p>
</li>
<li>
<p>b</p>
</li>
</ul>
<p>--&gt;</p>
<ul>
<li>c</li>
</ul>

So, I'm tempted to take out all the special-purpose code instead of adding something else that will probably break in some new way in the future...

jgm avatar Jun 11 '24 04:06 jgm

See also #7778 for another related case.

jgm avatar Jun 11 '24 06:06 jgm

I'd like to add another case to the list of problems.

- I am going to write a list item that contains `some fixed-width text that
  spans lines` and results in more spaces than one would want.

- For this list item, I'm going to change indentation inside `the fixed-width
text that spans lines` and this works, but is against our formatting guidelines.

This results in

\begin{itemize}
\item
  I am going to write a list item that contains
  \texttt{some\ fixed-width\ text\ that\ \ \ spans\ lines} and results
  in more spaces than one would want.
\item
  For this list item, I'm going to change indentation inside
  \texttt{the\ fixed-width\ text\ that\ spans\ lines} and this works,
  but is against our formatting guidelines.
\end{itemize}

I'm actually using a filter to render inline code differently, and I was hoping that I could work around this using the filter. Given the nature of inline code, it would be sufficient (for me, at least) to gobble all spaces after a newline. But I can't do that because Code.text does not contain the newline anymore. Is there some other workaround I can use? I'm not happy with not indenting list items because one ends up fighting against automatic indentation in text editors.

exzombie avatar Mar 14 '25 13:03 exzombie