Fenced code block as a list item
A bug report or maybe a request for improvement.
Sometimes it is necessary to have a fenced code block as a list item. As I have discovered, the proper syntax for this is not very intuitive.
pandoc input.md -o output.htm
- example 1
- ```
list item two
list item two
```
- list item three
In Example 1, each line inside the pre block is indented with two spaces, whereas I expected the lines won't be indented.
- example 2
- ```
list item two
list item two
```
- list item three
Example 2 works fine, but ony if the two lines inside the pre block are separated with the empty one. If there is no empty line between them, the markup will be <li><code>list item two list item two</code></li>.
- example 3
- ```
list item two
list item two
```
- list item three
Example 3 demonstrates the syntax that works fine. Though we can use it, I would prefer the syntax from the Example 1.
Hm. I can reproduce this. It's definitely not intended, and you won't get that behavior with -f commonmark or -f gfm.
% pandoc -t native
- example 1
- ```
list item two
list item two
```
- list item three
[ BulletList
[ [ Plain [ Str "example" , Space , Str "1" ] ]
, [ CodeBlock
( "" , [] , [] ) " list item two\n list item two"
]
, [ Plain
[ Str "list" , Space , Str "item" , Space , Str "three" ]
]
]
]
A bug I would say.
Even more minimal case:
% pandoc
- ```
item
```
^D
<ul>
<li><pre><code> item</code></pre></li>
</ul>
The problem lies with
listLineCommon :: PandocMonad m => MarkdownParser m Text
listLineCommon = T.concat <$> manyTill
( many1Char (satisfy $ \c -> c `notElem` ['\n', '<', '`'])
<|> fmap snd (withRaw code)
<|> fmap (renderTags . (:[]) . fst) (htmlTag isCommentTag)
<|> countChar 1 anyChar
) newline
Originally this function was just grabbing the first literal text line of the list item (whose raw contents would be reparsed later). But special handling was added for inline code and HTML comment tags (likely for good reasons which we can look up). Note that ``` can delimit inline code as well as code blocks. So, this function is gobbling the whole thing, instead of just the first line. And because of this, the code that would have removed the extra indentation doesn't get triggered (that's in listLine).
Code is a bit of a mess here -- I need to revisit some things, but I'm recording this diagnosis here for when I have a chance to do that.
Ref #5628
The story begins 15 years ago, with this commit: https://github.com/jgm/pandoc/commit/eb2e560d861387414fe03056189f32e54e83851b
That was meant to deal with cases like the following:
- a <!--
- b
-->
- c
That is still a case pandoc handles nicely (whereas commonmark doesn't recognize the HTML comment in this kind of context).
But the cost of dealing with this case was that, in consuming raw content for the list item, we needed to gobble material inside HTML comments. Fine! For many years we did that. But then someone came up with a case like
- a `<!--`
- b `-->`
in which the special characters are quoted in inline code. Well, clearly our "raw line" parser needs to gobble up inline code sections, too. And that's all fine until we have a case like yours. Note that
```
abc
```
would be perfectly valid inline code (were it not parsed first as a code block). So the raw list item parser gobbles up this whole chunk, avoiding the line-by-line reading that strips leading indentation.
What a mess!
In this case we could add an additional band-aid to the current pile of band-aids, probably. But I'm tempted to think that this was all a mistake, and that the way to sanity is the approach we took with commonmark, which just makes it very clear that indicators of block structure take precedence over inline parsing, and render the first example above as
<ul>
<li>
<p>a <!--</p>
</li>
<li>
<p>b</p>
</li>
</ul>
<p>--></p>
<ul>
<li>c</li>
</ul>
So, I'm tempted to take out all the special-purpose code instead of adding something else that will probably break in some new way in the future...
See also #7778 for another related case.
I'd like to add another case to the list of problems.
- I am going to write a list item that contains `some fixed-width text that
spans lines` and results in more spaces than one would want.
- For this list item, I'm going to change indentation inside `the fixed-width
text that spans lines` and this works, but is against our formatting guidelines.
This results in
\begin{itemize}
\item
I am going to write a list item that contains
\texttt{some\ fixed-width\ text\ that\ \ \ spans\ lines} and results
in more spaces than one would want.
\item
For this list item, I'm going to change indentation inside
\texttt{the\ fixed-width\ text\ that\ spans\ lines} and this works,
but is against our formatting guidelines.
\end{itemize}
I'm actually using a filter to render inline code differently, and I was hoping that I could work around this using the filter. Given the nature of inline code, it would be sufficient (for me, at least) to gobble all spaces after a newline. But I can't do that because Code.text does not contain the newline anymore. Is there some other workaround I can use? I'm not happy with not indenting list items because one ends up fighting against automatic indentation in text editors.