pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Use plain text writer for texorpdfstring

Open Jab2870 opened this issue 2 years ago • 13 comments

Example

## Test (\High)

We use the following in reports to indicate that an issue has a High severity. The \High macro is defined in our latex template and basically makes it red:

image

The relevant part of the AST is

        [
          {
            "t": "Str",
            "c": "Test"
          },
          {
            "t": "Space"
          },
          {
            "t": "Str",
            "c": "("
          },
          {
            "t": "RawInline",
            "c": [
              "tex",
              "\\High"
            ]
          },
          {
            "t": "Str",
            "c": ")"
          }
        ]

I have a lua filter that transforms it into

        [
          {
            "t": "Str",
            "c": "Test"
          },
          {
            "t": "Space"
          },
          {
            "t": "Str",
            "c": "("
          },
          {
            "t": "RawInline",
            "c": [
              "tex",
              "\\High"
            ]
          },         
         {
            "t": "RawInline",
            "c": [
              "plain",
              "High"
            ]
          },
          {
            "t": "Str",
            "c": ")"
          }
        ]

My hope was the plain version would be used inside the second argument of texorpdfstring, however it isn't.

When converted to latex I get:

\subsection{\texorpdfstring{Test (\High)}{Test ()}}\label{test}

I would like:

\subsection{\texorpdfstring{Test (\High)}{Test (High)}}\label{test}

Jab2870 avatar Nov 10 '23 12:11 Jab2870

That would involve using something based on writePlain instead of stringify at https://github.com/jgm/pandoc/blob/pr-9168/src/Text/Pandoc/Writers/LaTeX.hs#L678-L679

"based on" because writePlain converts Pandoc -> Text, not [Inline] -> Text, but something like

trim <$> writePlain opts (Pandoc nullMeta [Plain ils])

This would have some other advantages, e.g. in some cases unicode super/subscript numerals would be used. Math would also be handled better, arguably -- when possible it would be rendered using unicode characters.

A possible disadvantage would be performance. But if this is just for headings, it's probably not going to add up to much.

jgm avatar Nov 10 '23 16:11 jgm

Another disadvantage is that making one writer depend on another makes the code more spaghetti-like. However, I believe there are already some cases like this in the code base.

jgm avatar Nov 10 '23 16:11 jgm

I suppose another option would be having an attribute on header elements that is a plain text representation of the header. It would be less automatic, but would make configuring it in filters possible without re-implementing the whole header writer in a raw block.

Jab2870 avatar Nov 10 '23 16:11 Jab2870

If the writer were able to take something like:

  "blocks": [
    {
      "t": "Header",
      "c": [
        3,
        [
          "test",
          [],
          [
            [
              "plain",
              "example"
            ]
          ]
        ],
....

and turn it into:

\subsection{\texorpdfstring{Test (\High)}{example}}\label{test}

We would no longer have one writer reliant on another. However, users could use a lua filter like:

el.attr.attributes.plain = pandoc.write( pandoc.read( content,"markdown" ),"plain")

(not tested, likely typo somewhere but you hopefully get the idea)

Jab2870 avatar Nov 10 '23 17:11 Jab2870

Precedents for a writer depending on a writer: OPML writer depends on markdown and html writers ODT writer depends on opendocument writer ipynb writer depends on markdown, plain writer chunkedhtml and epub writers depend on html writer markdown writer depends on html writer

jgm avatar Nov 10 '23 17:11 jgm

If we thought that the markdown writer might some day need to depend on the LaTeX writer, that would be a reason for not establishing a dependency the other way. Otherwise, maybe it's okay?

jgm avatar Nov 10 '23 17:11 jgm

I think having an attribute allows for the most flexibility when it comes to filters. It also allows users to specify exactly what they want in the original document. In the case of markdown, that might be

## Test (\Hard) {plain="test - hard"}

This is obviously a slightly contrived example, although I don't really see a disadvantage with this option.

And, I would suppose, there is nothing stopping the latex writer from falling back to depending on plain if the plain attribute isn't present and you want to get the better Maths etc for free (as far as a user is concerned at least)

Jab2870 avatar Nov 10 '23 18:11 Jab2870

Yes, it provides more flexibility, but there is a disadvantage: it's not automatic. So, for example, you wouldn't get the improved treatment of math and subscripts noted above.

You're right that we could combine the two approaches.

jgm avatar Nov 10 '23 18:11 jgm

so, i might have some time in the next few days to work on a PR if you want. would be a good excuse to learn the basics of Haskell. do you have a preference for which route you want to take?

Jab2870 avatar Nov 12 '23 09:11 Jab2870

I'm most tempted by the simpler approach of just using the plain writer. If there's a continuing need for more flexibility, we could consider the other later.

jgm avatar Nov 13 '23 23:11 jgm

If we thought that the markdown writer might some day need to depend on the LaTeX writer, that would be a reason for not establishing a dependency the other way. Otherwise, maybe it's okay?

I'm happy with that, although I had a thought about this. Does the markdown writer not depend on the latex writer at all for maths? As in: https://pandoc.org/MANUAL.html#math

Jab2870 avatar Nov 14 '23 11:11 Jab2870

Does the markdown writer not depend on the latex writer at all for maths?

No, it doesn't. Math in pandoc is actually stored in the AST in tex format. So no conversion is necessary in the writer.

jgm avatar Nov 14 '23 16:11 jgm

A vote for adding an attribute, e.g.

## Some info about \LaTeX\ macros {pdfstring="About LaTeX"}

which would be intuitive(?) and line up with other attributes like #someid .someclass etc.

If the attribute is present, use it for texorpdfstring plain text argument.

priiduonu avatar Dec 06 '25 15:12 priiduonu