minted icon indicating copy to clipboard operation
minted copied to clipboard

Allow inclusion of custom lexers

Open muellerj opened this issue 6 years ago • 17 comments

It would be awesome, if the inclusion of custom lexers was supported out of the box. I have tried a couple of routes, all of which stink in one way or another.

Suppose if have a custom lexer at ./lib/my_lexer.py with class called MyLexer, which exposes a lexer with alias my. What I really want to do behind the scenes, is to to call pygmentize with the custom lexer:

pygmentize -l ./lib/my_lexer.py:MyLexer -x <input>

If I create a custom shell script mypygmentize, which does the wrapping of the -l and -x option, I can get most of the way there. However, minted seems to append the name of the lexer to the command line options regardless, such that (I guess) the final call ends up being pygmentize -l ./lib/my_lexer.py:MyLexer -x -l my <input>. The second -l my confuses pygmentize and it complains about not knowing any lexers for alias my.

The other option is to hack the command line invocation directly into the "language" option of the minted environment:

% Dirty, dirty hack to embed our own lexer into minted
\def\mylexer{lib/my_lexer.py:MyLexer -x}

% ...

\begin{minted}[autogobble=true,
                frame=lines,
                style=default,
                framesep=2mm]{\mylexer}%

% ...

This works, but it's really ugly in my opinion. Any idea, how this might work in a more sane fashion? :)

muellerj avatar Aug 11 '17 12:08 muellerj

I'd suggest looking at http://pygments.org/docs/plugins/. I haven't worked with custom Pygments lexers myself, but it seems like custom lexers can be treated as native as long as they are packaged in a certain way.

I'm working on designing minted 3.0, and will keep this in mind during that process (probably at least a month or two till any release). There may be some ways to simplify this and similar things on the minted, as opposed to Pygments, side of things.

gpoore avatar Aug 11 '17 13:08 gpoore

Awesome, thank you. Please leave a link to the relevant branch (or such) here once you do upload something!

muellerj avatar Aug 28 '17 07:08 muellerj

I'd also be very interested in this functionality, although I think it could be stop-gapped by providing some minimal sugar over the “hack” to make it not look quite as ugly.

pthariensflame avatar Sep 07 '17 17:09 pthariensflame

Maybe it would be possible or even easier to provide a generic way to specify the external highlighter, such that we could swap in our own solution. I've just now rewritten my lexer in Rogue and was pleasantly surprised, how easy it was.

muellerj avatar Nov 06 '17 06:11 muellerj

So the final call to pygmentize at the moment seems to be

pygmentize -l c -f latex -P commandprefix=PYG -F tokenmerge -o _minted-master/EAFA21BF19C2027915E675D18A41367FA70E7C711A43E0CB65C026E9F0174710.pygtex master.pyg

What would be really cool is if we could customize this call (with placeholders for the values). I'm guessing the only interface on the "other side" is that minted expects a file at _minted-master/EAFA21BF19C2027915E675D18A41367FA70E7C711A43E0CB65C026E9F0174710.pygtex to be written by that command?

muellerj avatar Nov 06 '17 08:11 muellerj

@muellerj I've thought a little about support for other highlighters. For example, I would like something that supports TextMate grammars. Customizing the highlighting command would be one part of this, and probably wouldn't be too difficult. The difficult part would be the highlighter output. If the highlighting commands aren't minted-compatible (or just completely general LaTeX, or self-contained), then a new set of style macros would need to be generated. The biggest issue would be a highlighter that has no equivalent for some minted options, or that provides options that minted doesn't have and aren't easy to add. In that case, there would need to be additional per-highlighter settings, which could increase the complexity a good bit.

Between my work on the next minted and the next pythontex, I'm thinking that it would be very useful to have a package that provides generic support for running external programs and caching the results. Something like that may end up being the way forward for minted and pythontex, and would also make it easier to work with other programs like additional highlighters. Unfortunately, that's probably more of a very long-term possible solution.

gpoore avatar Nov 06 '17 13:11 gpoore

@gpoore Thanks for the reply. I absolutely understand the implications and there are no doubt a lot of close ties between minted options and the way these are implemented by pygmentize. My first comment was just thinking out loud since the implementation of the Rouge lexer was quite easy. I ended up trying to place a translator (using the \renewcommand{\MintedPygmentize}{bin/myhighlighter} route) but fell short, since Rouge doesn't support the tex output out of the box.

If you end up generalising the interface to the highlighting shell script a little bit (such that for example a custom lexer can be included), please consider my needs met. In any case thanks for all the work.

muellerj avatar Nov 06 '17 13:11 muellerj

Any news on this? Pygments is particularly bad in highlighting Modern C++ ...

Thanks!

h-2 avatar Feb 24 '20 14:02 h-2

@h-2 I recommend you try my new package Shiki LaTeX. It’s compatible with minted and replaces Pygments with Shiki, which probably produces better results for C++ (though I haven’t tested).

leafac avatar Mar 10 '20 18:03 leafac

@leafac Thanks! I gave it a try, but it's not working yet for me.

h-2 avatar Mar 12 '20 16:03 h-2

https://github.com/leafac/shiki-latex/issues/1#issuecomment-598752546

leafac avatar Mar 13 '20 14:03 leafac

Still would love to see custom lexer support. Currently I do need to patch the installed pygments library manually, to easily get a fix I need in some lexer.

Defining a custiom one (and custom language) may be better.

rugk avatar Aug 25 '20 22:08 rugk

By adding two new options

  • custom (no value) and
  • formatter=<formatter> (default value is latex)

to minted, the following example allows inclusion of both custom lexer and formatter.

\begin{filecontents}[noheader,force]{your_lexer.py}
from pygments.lexers.markup import TexLexer
from pygments.token import Keyword


class TexLexer2(TexLexer):
    """
    Improved lexer for the TeX and LaTeX typesetting languages.
    Character "@" is treated part of command names.
    """

    TexLexer.tokens['root'][4] = (r'\\([a-zA-Z@]+|.)', Keyword, 'command')
\end{filecontents}

\documentclass{article}
\usepackage{minted}
\usepackage{regexpatch}

\makeatletter
\newcommand{\minted@def@optcl@novalue}[2]{%
  \define@key{minted@opt@g}{#1}[]{%
    \minted@addto@optlistcl{\minted@optlistcl@g}{#2}%
    \@namedef{minted@opt@g:#1}{#2}}%
  \define@key{minted@opt@g@i}{#1}[]{%
    \minted@addto@optlistcl{\minted@optlistcl@g@i}{#2}%
    \@namedef{minted@opt@g@i:#1}{#2}}%
  \define@key{minted@opt@lang}{#1}[]{%
    \minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang}{#2}%
    \@namedef{minted@opt@lang\minted@lang:#1}{#2}}%
  \define@key{minted@opt@lang@i}{#1}[]{%
    \minted@addto@optlistcl@lang{%
      minted@optlistcl@lang\minted@lang @i}{#2}%
    \@namedef{minted@opt@lang\minted@lang @i:#1}{#2}}%
  \define@key{minted@opt@cmd}{#1}[]{%
    \minted@addto@optlistcl{\minted@optlistcl@cmd}{#2}%
    \@namedef{minted@opt@cmd:#1}{#2}}%
}

% new minted option "custom" for adding command line option "-x"
\minted@def@optcl@novalue{custom}{-x}

% new minted option "formatter=<formatter>" for specifying pygments formatter
\minted@def@opt{formatter}

% apply "-f <formatter>"
\newcommand\minted@formatter{%
  \minted@get@opt{formatter}{latex}\space
}

\xpatchcmd*\minted@checkstyle
  {-f latex }
  {-f \minted@formatter}
  {}{\fail}
\xpatchcmd*\minted@pygmentize
  {-f latex }
  {-f \minted@formatter}
  {}{\fail}

\makeatother

\begin{document}
\parindent=0pt
\begin{Verbatim}[gobble=2]
  Usage:
    \begin{minted}[formatter=<formatter>]{<lexer>}
    \end{minted}
  Use option "custom" to allow custom <formatter> and/or <lexer>.
\end{Verbatim}


\subsection*{Test different formatters}
Default formatter \verb|tex|
\begin{minted}{tex}
  abc \emph{text} $a + b = c$
\end{minted}

Formatter \verb|html|\par
\begin{minted}[formatter=html]{tex}
  abc \emph{text} $a + b = c$
\end{minted}

\subsection*{Test custom lexer}
Default lexer \verb|-l latex|
\begin{minted}[custom]{tex}
  \define@key{minted@opt@g}{#1}[]{%
    \minted@addto@optlistcl{\minted@optlistcl@g}{#2}%
    \@namedef{minted@opt@g:#1}{#2}}
\end{minted}

Custom lexer, \verb|@| is part of command names
\begin{minted}[custom]{your_lexer.py:TexLexer2}
  \define@key{minted@opt@g}{#1}[]{%
    \minted@addto@optlistcl{\minted@optlistcl@g}{#2}%
    \@namedef{minted@opt@g:#1}{#2}}
\end{minted}

\end{document}

image

muzimuzhi avatar Sep 19 '20 19:09 muzimuzhi

I’ll test this as soon as I can, do you plan to release a new version with that?

muellerj avatar Sep 20 '20 14:09 muellerj

@muzimuzhi Your trick is great :), it would be nice to adapt a custom lexer something for LaTeX3 as proposed here (https://www.alanshawn.com/tech/2020/05/25/latex-3.html#hilighting-latex3-code).

pablgonz avatar Nov 29 '20 03:11 pablgonz

Not sure if this is useful for anyone, but I thought I'd post my workaround here in the meantime. Basically the same as the original post, except that I parse the arguments and replace the custom lexer arguments.

First the script (pygmentize_local)

#!/bin/env bash

arguments=()

while [[ "$#" -ne 0 ]]; do
  case "$1" in
  -l)
    arguments+=("$1")

    shift
    lexer=$1

    if [[ "$lexer" == "mycpp" ]]; then
      arguments+=("./lexers/my_cpp.py:MyCppLexer")
      arguments+=("-x")
    else
      arguments+=("${lexer}")
    fi

    ;;
  *)
    arguments+=("$1")
    ;;
  esac
  shift
done

pygmentize "${arguments[@]}"

Then the lexers/my_cpp.py file is

from pygments.lexers.c_cpp import CppLexer
from pygments.token import Name, string_to_tokentype


class MyCppLexer(CppLexer):
    name = 'MyCpp'
    aliases = ['mycpp']
    filenames = ['*.cpp']

    EXTRA_CLASSES = ['string', 'vector']

    EXTRA_NAMESPACES = ['std']

    def get_tokens_unprocessed(self, text):
        for index, token, value in CppLexer.get_tokens_unprocessed(self, text):
            if token is Name and value in self.EXTRA_CLASSES:
                yield index, string_to_tokentype("Name.Class"), value
            if token is Name and value in self.EXTRA_NAMESPACES:
                yield index, string_to_tokentype("Keyword.Namespace"), value
            else:
                yield index, token, value

Then finally replace the command in your tex document

\renewcommand{\MintedPygmentize}{./pygmentize_local}

Irubataru avatar Dec 09 '21 16:12 Irubataru

Not sure if this is useful for anyone, but I thought I'd post my workaround here in the meantime. Basically the same as the original post, except that I parse the arguments and replace the custom lexer arguments.

That's really useful @Irubataru. One simplification is that you can put everything in one file (e.g. pygmentize.py) as follows:

#! /usr/bin/env python
import argparse
import sys
import pygments.cmdline as _cmdline
import pygments.lexer as _lexer
import pygments.token as _token


def main(args):
    parser = argparse.ArgumentParser()
    parser.add_argument('-l', dest='lexer', type=str)
    opts, rest = parser.parse_known_args(args[1:])
    if opts.lexer == 'ssh_config':
        args = [__file__, '-l', __file__ + ':SSHConfigLexer', '-x', *rest]
    _cmdline.main(args)


class SSHConfigLexer(_lexer.RegexLexer):
    name = 'ssh_config'
    tokens = {
        'root': [
            (r'(\s*Host)( .*\n)', _lexer.bygroups(
                _token.Keyword, _token.String)),
            (r'(\s*\w*)( ?.*\n)', _lexer.bygroups(
                _token.Name.Attribute, _token.String)),
        ],
    }


if __name__ == '__main__':
    main(sys.argv)

and then

\renewcommand{\MintedPygmentize}{./pygmentize.py}

coldfix avatar Feb 16 '22 11:02 coldfix

Not sure if this is useful for anyone, but I thought I'd post my workaround here in the meantime. Basically the same as the original post, except that I parse the arguments and replace the custom lexer arguments.

That's really useful @Irubataru. One simplification is that you can put everything in one file (e.g. pygmentize.py) as follows:

\renewcommand{\MintedPygmentize}{./pygmentize.py}

Just so I don't forget this again (I've forgotten this every time I come back to this code): you need to chmod +x pygmentize.py (otherwise pdflatex will continue to complain about you not having pygmentize in your venv.

makslevental avatar Oct 21 '22 04:10 makslevental

@muzimuzhi Your solution is great! Thank you very much for that! :) I have added another condition to support both TeXLive 2022 and current versions of minted at the same time. In order to use it, do the following:

  1. Copy the following contents into a file called e.g., mintedfix.tex (not actually needed, but quite clean this way):
    \makeatletter
    \ifdefined\minted@optlistcl@quote
    \ifwindows
      \renewcommand{\minted@optlistcl@quote}[2]{%
        \ifstrempty{#2}{\detokenize{#1}}{\detokenize{#1="#2"}}}
    \else
      \renewcommand{\minted@optlistcl@quote}[2]{%
        \ifstrempty{#2}{\detokenize{#1}}{\detokenize{#1='#2'}}}
    \fi
    \fi
    
    % similar to \minted@def@optcl@switch
    \newcommand{\minted@def@optcl@novalue}[2]{%
      \define@booleankey{minted@opt@g}{#1}%
        {\minted@addto@optlistcl{\minted@optlistcl@g}{#2}{}%
         \@namedef{minted@opt@g:#1}{true}}
        {\@namedef{minted@opt@g:#1}{false}}
      \define@booleankey{minted@opt@g@i}{#1}%
        {\minted@addto@optlistcl{\minted@optlistcl@g@i}{#2}{}%
         \@namedef{minted@opt@g@i:#1}{true}}
        {\@namedef{minted@opt@g@i:#1}{false}}
      \define@booleankey{minted@opt@lang}{#1}%
        {\minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang}{#2}{}%
         \@namedef{minted@opt@lang\minted@lang:#1}{true}}
        {\@namedef{minted@opt@lang\minted@lang:#1}{false}}
      \define@booleankey{minted@opt@lang@i}{#1}%
        {\minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang @i}{#2}{}%
         \@namedef{minted@opt@lang\minted@lang @i:#1}{true}}
        {\@namedef{minted@opt@lang\minted@lang @i:#1}{false}}
      \define@booleankey{minted@opt@cmd}{#1}%
          {\minted@addto@optlistcl{\minted@optlistcl@cmd}{#2}{}%
            \@namedef{minted@opt@cmd:#1}{true}}
          {\@namedef{minted@opt@cmd:#1}{false}}
    }
    
    \minted@def@optcl@novalue{custom lexer}{-x}
    
    \makeatother
    
  2. In your main LaTeX file, load this file after minted:
    [...]
    \usepackage{minted}
    \input{mintedfix}
    [...]
    

ThexXTURBOXx avatar Jul 30 '23 18:07 ThexXTURBOXx

minted version 3.0 is now under development, thanks to a grant from the TeX Users Group. It will include official support for custom lexers. It will also be able to be extended using Python, not just LaTeX macro programming, which will make possible many new lexer-related features. Progress on custom lexers will be tracked in #372. Initial beta releases of minted version 3.0 are expected by early 2024.

gpoore avatar Sep 12 '23 17:09 gpoore