pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Markdown to latex creates excessive `\hypertarget` wrappers around sections

Open kirk86 opened this issue 2 years ago • 24 comments

Explain the problem. When I try to convert markdown to latex I get all sections wrapped with the \hypertarget directive. I don't know if this is intended but it seems a bit excessive.

MWE - Markdown

# This is a title 

## This is a subtitle

### This is a subsubtitle

Here's some math $y = x^{2}$ and some curly braces $\\{1,2,\ldots,n\\}$. Finally some [text in brackets]

Latex conversion

\hypertarget{this-is-a-title}{%
\section{This is a title}\label{this-is-a-title}}

\hypertarget{this-is-a-subtitle}{%
\subsection{This is a subtitle}\label{this-is-a-subtitle}}

\hypertarget{this-is-a-subsubtitle}{%
\subsubsection{This is a subsubtitle}\label{this-is-a-subsubtitle}}

Here's some math \(y = x^{2}\) and some curly braces
\(\\{1,2,\ldots,n\\}\). Finally some {[}text in brackets{]}

On a side note, is there any way to preserve $ instead of \( or $$ instead of \[, and, avoid brackets[] escaped with curly braces{[} {]}

Pandoc version? pandoc 3.1.2 on MacOS and latest ubuntu

Command pandoc --from markdown+tex_math_dollars --to latex --output dummy.tex dummy.md

I was under the impression that +tex_math_dollars would preserve $ instead of \( and $$ instead of \[, but maybe I understood wrong?

kirk86 avatar Apr 01 '23 20:04 kirk86

The intent is to allow the sections to be linked to. Is it excessive? Not if you want to be able to link to the sections.

If you don't care about that and want cleaner output, then you can get it by disabling auto_identifiers on the markdown reader. If there are no identifiers, LaTeX output won't include a link.

% pandoc -f markdown-auto_identifiers -t latex
# This is a title 

## This is a subtitle

### This is a subsubtitle

Here's some math $y = x^{2}$ and some curly braces $\\{1,2,\ldots,n\\}$. Finally some [text in brackets]
^D
\section{This is a title}

\subsection{This is a subtitle}

\subsubsection{This is a subsubtitle}

Here's some math \(y = x^{2}\) and some curly braces
\(\\{1,2,\ldots,n\\}\). Finally some {[}text in brackets{]}

jgm avatar Apr 02 '23 06:04 jgm

I was under the impression that +tex_math_dollars would preserve $ instead of \(...

The option refers to the syntax used in the markdown input. The LaTeX output always uses the (now preferred) \(.

jgm avatar Apr 02 '23 06:04 jgm

The intent is to allow the sections to be linked to. Is it excessive? Not if you want to be able to link to the sections.

The \hypertarget commands are excessive. hyperref adds automatically targets there anyway which are used by the \label.

\section{This is a title}\label{this-is-a-title} is quite enough for links. Every \ref command will then correctly link to the section, and for a manual link one can use \hyperref[this-is-a-title]{some text}.

u-fischer avatar Aug 18 '23 13:08 u-fischer

@u-fischer Are you sure? I just tested this claim by doing pandoc -s -o my.tex MANUAL.txt. I then erased the hypertarget around

\section{Templates}\label{templates}

and recompiled using pdflatex. Clicking the link on Templates no longer worked. I then restored the hypertarget

\hypertarget{templates}{%
\section{Templates}\label{templates}}

and it worked again. The link is created using

\protect\hyperlink{templates}{Templates}).

Using texlive 2023.

jgm avatar Aug 18 '23 23:08 jgm

Ah, I see, you're talking about \ref commands. But that's not what we're using here. We need to be able to create hyperlinks to headings with arbitrary link text.

jgm avatar Aug 18 '23 23:08 jgm

@u-fischer Are you sure?

Yes ;-). I maintain hyperref and know what it is doing.

The link is created using

\protect\hyperlink{templates}{Templates}).

This is the wrong command for manual links. As I wrote use

\hyperref[templates]{Templates}

(Note the bracket of the first argument! Also unlike \hyperlink this command is robust, so no \protect needed).

If you simply want to repeat the title of the section you naturally can also use \nameref{templates}.

The low-level commands \hypertarget and \hyperlink normally only be used in own definitions or very special cases.

The main problem with your \hypertargets (apart that the LaTeX code looks ugly) is that in a tagged PDF they wouldn't be in the section heading structure and so the structure links would be wrong. That is bad for html export and accessibility.

u-fischer avatar Aug 19 '23 07:08 u-fischer

OK, that's great to know!

jgm avatar Aug 19 '23 16:08 jgm

After updating to the latest pandoc we stumbled over this. We use the named destinations that got generated by \hypertarget to jump to that location in the PDF via e.g. libpoppler's link feature. Now we can no longer do that as it is just a \label that seems not to generate such a named destination. Is there a way to influence that behavior?

christoph-cullmann avatar Nov 28 '23 12:11 christoph-cullmann

Let's see if @u-fischer has any suggestions.

jgm avatar Nov 28 '23 17:11 jgm

well there is a named destination. You only need its name. You could e.g. check what the \label command writes to the aux. So

\section{abc}\label{abc}

would write

\newlabel{abc}{{1}{1}{abc}{section.1}{}}

and from this you know that section.1 is the name of the destination.

If you really want to control the target names manually, you can use \NextLinkTarget to adapt the name of the next target:

\documentclass[a4paper]{article} \usepackage{hyperref} \begin{document}

\NextLinkTarget{target:abc} %section target is now named target:abc \section{abc}\label{abc}

\end{document}

u-fischer avatar Nov 28 '23 17:11 u-fischer

Does \NextLinkTarget need to come before the \section command or could it come between \section and \label?

jgm avatar Nov 28 '23 18:11 jgm

It need to come before the \section, as that command creates the target.

u-fischer avatar Nov 28 '23 18:11 u-fischer

Thanks already for the quick feedback.

I think parsing the .aux is out of scope for us. That will not work with reasonable effort in our tooling and is just another thing go maintain for us.

The workarounds you describe will work if I write TeX on my own, but I talk about what is generated.

Perhaps I misunderstand, therefore here the problem we face.

Before we had in the generated TeX by pandoc

\hypertarget{this-is-a-title}{%
\section{This is a title}\label{this-is-a-title}}

, now we just have

\section{This is a title}\label{this-is-a-title}

The old variant allowed us to know based on the pandoc input what the named destination links are, e.g. what poppler/okular/acroread can address for our online help.

Can I get pandoc to generate what you propose to get that behavior back? Otherwise we must post-process the generated code or as we did now, as a hack. redefine \label.

Perhaps this is just something nobody else does, but given the only drawback of the old output, if I did get this issue right, was the bit excessive TeX code, I think it is bad that the nicely usable link destinations got lost that can be used more or less like an anchor in HTML with the most PDF tools.

christoph-cullmann avatar Nov 28 '23 19:11 christoph-cullmann

The main problem with the old input was not excessive code. The main problem is as I wrote above that the target is in the wrong place. In a tagged PDF they wouldn't be in the section heading structure and so the structure links would be wrong. That is bad for html export and accessibility.

But if pandoc was able to add \hypertarget{xxx} before the \section command it should be able to add a \NextLinkTarget{xxx} instead.

u-fischer avatar Nov 28 '23 19:11 u-fischer

Ok, I see, I would be happy with \NextLinkTarget, too. Thanks for clarifying this. @jgm, would that be something that could be added?

christoph-cullmann avatar Nov 28 '23 19:11 christoph-cullmann

But be aware that \NextLinkTarget is rather new, so won't work with some outdated tex system.

u-fischer avatar Nov 28 '23 20:11 u-fischer

Hm. How new? What version of texlive, for example, would be required?

jgm avatar Nov 28 '23 23:11 jgm

texlive 2022. But you could simply add \providecommand\NextLinkTarget[1]{\hypertarget{#1}{}} .

u-fischer avatar Nov 29 '23 11:11 u-fischer

That sounds like a promising approach. I'll reopen this issue and plan to implement that suggestion. It does mean that the LaTeX source will be uglier, but so be it.

jgm avatar Nov 29 '23 16:11 jgm

@christoph-cullmann is it sufficient to add these before headings? Or did you need to link to spans, divs, figures, etc.?

jgm avatar Nov 30 '23 02:11 jgm

@christoph-cullmann is it sufficient to add these before headings? Or did you need to link to spans, divs, figures, etc.?

Hmm, I think we only link sections, but I would assume it to be more consistent if that just works for all kind of these links.

christoph-cullmann avatar Nov 30 '23 11:11 christoph-cullmann

I've been following the conversation and since I'm no expert FWIW I'll add my 2 cents. @jgm If you end up adding the \NextLinkTarget make it possible though some key or command line options that can be easily enabled or disabled. This is not just about ugly TeX source, imagine someone writing hundreds of pages like a manual or thesis in markdown and then exporting to TeX, then that ugly source becomes a bit of a headache to clean up and wasted time that could have been spent better elsewhere.

kirk86 avatar Dec 19 '23 13:12 kirk86

bit of a headache to clean up and wasted time

Regex search and replace and it's done in one second...

jgm avatar Dec 19 '23 17:12 jgm

I would argue otherwise but do as you wish. Also, the same argument can be used in the case of @christoph-cullmann ....

Regex search and replace and it's done in one second...

Plus, the recommended way to do linkage between sections in any TeX manual that I've seen is \label{mysection}.....\ref{mysection} as @u-fischer has already suggested.

Maybe the real issue here are poppler/okular/acroread? Maybe they need to be updated to reflect current TeX standards? Just a wild guess, don't know anything about them, other than that the poppler library in pdf tools in emacs sucks, too slow.

The old variant allowed us to know based on the pandoc input what the named destination links are, e.g. what poppler/okular/acroread can address for our online help

kirk86 avatar Dec 19 '23 18:12 kirk86