pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Incorrectly escaped backslashes in the value of a fenced code block's attributes

Open wtbutler opened this issue 3 years ago • 11 comments

I'm trying to use pandoc in order to convert a markdown file to a pdf, and I ran into an issue where it garbled the formatting attributes of code blocks. I narrowed the issue down to pandoc going from markdown to texfiles in general.

When using the command pandoc -s --listings tmp.md -o tmp.tex to convert the following markdown

```{backgroundcolor="\color{yellow!10}"}
"It's a beautiful day in the neiborhood"
```
It's a beautiful day in the neiborhood

to a .tex file in version 2.9.2.1, I get the expected output of

\begin{lstlisting}[backgroundcolor={\color{yellow!10}}]

as the start of the listing. However, on version 2.17.1.1, it starts

\begin{lstlisting}[backgroundcolor={\textbackslash color\{yellow!10\}}]

where it completely misinterprets the backslashes in the formatting instructions. I've tried reproducing on the online, but I couldn't figure out how to enable the --listings option, which is key to getting the values to actually show up.

wtbutler avatar Dec 26 '22 05:12 wtbutler

l. 431 of Text.Pandoc.Readers.LaTeX

        kvs <- mapM (\(k,v) -> (k,) <$>
                       stringToLaTeX TextString v) keyvalAttr

The stringToLaTeX will cause the contents of the attribute to be escaped in the way that would be appropriate for a literal string in LaTeX. Here that's not what you want, because you mean for the attribute to include literal LaTeX. Perhaps that will always be the case for listing attributes?

Looking at the history I see commit 0b3b77415f and commit a55fb5f29d3772981adfc494c2597f0a1b8bdb64 which fixed #6742.

jgm avatar Dec 26 '22 17:12 jgm

It's tough to know how to deal with this context. If we do escape, we'll run into problems like yours from people who want to use TeX commands in these attributes. If we don't, we'll run into problems like #6742. Perhaps we should have solved #6742 by telling the user to backslash escape the _ in their attribute value. Or is that even necessary? (I didn't try running that code, with caption="some_code.c", through LaTeX to see if it compiles.)

jgm avatar Dec 26 '22 17:12 jgm

With regards to whether the underscore escaping was necessary, it appears that it was, as

\begin{lstlisting}[caption={some_code.c}]
code here
\end{lstlisting}

gives the following output

Package hyperref Warning: Rerun to get /PageLabels entry.

! Missing $ inserted.
<inserted text> 
                $
l.56 \begin{lstlisting}[caption={some_code.c}]
                                              
? 

when compiled with xelatex. The way that I think about it intuitively, is that if you're adding attributes that will be used by a specific system, then pandoc should give them to that system unaltered. i.e. if you're giving a code block a caption because you know that latex has a caption field, then it should be formatted as though it were directly in the caption field. That's a roundabout way of saying that I think that the user should be escaping the underscore in their attribute value. But I can understand why that user didn't want to do that and didn't expect it to happen that way. This might be feature creep, but would there be a way to have atribute="value" escape characters, but attribute:="value" passes the value literally? Or some way to ensure that pandoc passes that value literally to what is expecting it? (The specific operator syntax isn't necessary obviously, if there's existing syntax for a similar operation, then that works too)

wtbutler avatar Dec 26 '22 21:12 wtbutler

Pandoc supports multiple output formats. So, your code block with attribute caption="some_code.c" could be used with LaTeX/listings, but it could also be used with other output formats.

The pandoc types don't currently give us a way to represent the difference you're suggesting between a "passthrough attribute" and a "textual attribute."

jgm avatar Dec 26 '22 21:12 jgm

Drat. In that case, it would make more sense to me if pandoc didn't modify the fields at all then, because if pandoc escapes then the user loses a lot of freedom in those attributes. If the user has to escape, then all those things are still possible, the user just has to take escaping into account.

wtbutler avatar Dec 27 '22 04:12 wtbutler

It goes both ways, though: escaping gives the user the freedom to generate multiple output formats from the same source document.

I see both arguments, and I'm not sure right now what the best solution would be.

jgm avatar Dec 27 '22 05:12 jgm

I think that would be true if pandoc did more to map specific attribute names to specific output fields (other than startFrom). Adding a caption field only adds a caption to a latex document, at least in my experimentation (mostly with HTML). But because the attribute field name is (or at least seems to be) particular to the output format, I think it makes more sense to assume that the attribute value is going to be parsed by that format as well.

wtbutler avatar Dec 27 '22 08:12 wtbutler

because the attribute field name is (or at least seems to be) particular to the output format

I don't think there's anything about a caption attribute that is specifically connected with LaTeX. I can imagine many people making use of this. Even if support for it in other formats isn't built into pandoc, people customize using filters.

jgm avatar Dec 27 '22 17:12 jgm

In general, some escaping needs to be done for attribute values. In HTML/XML formats, for example, all attribute values have & changed to &amp;.

jgm avatar Dec 27 '22 17:12 jgm

After some more experimentation with the only formats that use attributes (LaTeX, Docx, pptx, Ms, and HTML, at least, according to the docs), latex is indeed the only one that uses the caption field. HTML is the only one that even keeps the data at all, and HTML keeps it only as a data-caption field that doesn't show up at all.

wtbutler avatar Dec 27 '22 19:12 wtbutler

I came across this same issue with trying to control the fontsize in a verbatim block that wrapped:

~~~~ {caption="Example output from tool" basicstyle="\footnotesize\ttfamily"}
...
~~~~

There's no way I can think of to escape this back so that you get the right values in the output. That is, escaping is a one way function.

ptoboley avatar Jun 12 '24 12:06 ptoboley

Is there a solution to this? I'm faced with exactly the same problem, trying to set "basicstyle=\tiny\rmfamily" for a beamer presentation.

galk-research avatar May 07 '25 20:05 galk-research