minted
minted copied to clipboard
Allow inclusion of custom lexers
It would be awesome, if the inclusion of custom lexers was supported out of the box. I have tried a couple of routes, all of which stink in one way or another.
Suppose if have a custom lexer at ./lib/my_lexer.py
with class called MyLexer
, which exposes a lexer with alias my
. What I really want to do behind the scenes, is to to call pygmentize
with the custom lexer:
pygmentize -l ./lib/my_lexer.py:MyLexer -x <input>
If I create a custom shell script mypygmentize
, which does the wrapping of the -l
and -x
option, I can get most of the way there. However, minted
seems to append the name of the lexer to the command line options regardless, such that (I guess) the final call ends up being pygmentize -l ./lib/my_lexer.py:MyLexer -x -l my <input>
. The second -l my
confuses pygmentize
and it complains about not knowing any lexers for alias my
.
The other option is to hack the command line invocation directly into the "language" option of the minted
environment:
% Dirty, dirty hack to embed our own lexer into minted
\def\mylexer{lib/my_lexer.py:MyLexer -x}
% ...
\begin{minted}[autogobble=true,
frame=lines,
style=default,
framesep=2mm]{\mylexer}%
% ...
This works, but it's really ugly in my opinion. Any idea, how this might work in a more sane fashion? :)
I'd suggest looking at http://pygments.org/docs/plugins/. I haven't worked with custom Pygments lexers myself, but it seems like custom lexers can be treated as native as long as they are packaged in a certain way.
I'm working on designing minted
3.0, and will keep this in mind during that process (probably at least a month or two till any release). There may be some ways to simplify this and similar things on the minted
, as opposed to Pygments, side of things.
Awesome, thank you. Please leave a link to the relevant branch (or such) here once you do upload something!
I'd also be very interested in this functionality, although I think it could be stop-gapped by providing some minimal sugar over the “hack” to make it not look quite as ugly.
Maybe it would be possible or even easier to provide a generic way to specify the external highlighter, such that we could swap in our own solution. I've just now rewritten my lexer in Rogue and was pleasantly surprised, how easy it was.
So the final call to pygmentize
at the moment seems to be
pygmentize -l c -f latex -P commandprefix=PYG -F tokenmerge -o _minted-master/EAFA21BF19C2027915E675D18A41367FA70E7C711A43E0CB65C026E9F0174710.pygtex master.pyg
What would be really cool is if we could customize this call (with placeholders for the values). I'm guessing the only interface on the "other side" is that minted expects a file at _minted-master/EAFA21BF19C2027915E675D18A41367FA70E7C711A43E0CB65C026E9F0174710.pygtex
to be written by that command?
@muellerj I've thought a little about support for other highlighters. For example, I would like something that supports TextMate grammars. Customizing the highlighting command would be one part of this, and probably wouldn't be too difficult. The difficult part would be the highlighter output. If the highlighting commands aren't minted
-compatible (or just completely general LaTeX, or self-contained), then a new set of style macros would need to be generated. The biggest issue would be a highlighter that has no equivalent for some minted
options, or that provides options that minted
doesn't have and aren't easy to add. In that case, there would need to be additional per-highlighter settings, which could increase the complexity a good bit.
Between my work on the next minted
and the next pythontex
, I'm thinking that it would be very useful to have a package that provides generic support for running external programs and caching the results. Something like that may end up being the way forward for minted
and pythontex
, and would also make it easier to work with other programs like additional highlighters. Unfortunately, that's probably more of a very long-term possible solution.
@gpoore Thanks for the reply. I absolutely understand the implications and there are no doubt a lot of close ties between minted
options and the way these are implemented by pygmentize
. My first comment was just thinking out loud since the implementation of the Rouge lexer was quite easy. I ended up trying to place a translator (using the \renewcommand{\MintedPygmentize}{bin/myhighlighter}
route) but fell short, since Rouge doesn't support the tex output out of the box.
If you end up generalising the interface to the highlighting shell script a little bit (such that for example a custom lexer can be included), please consider my needs met. In any case thanks for all the work.
Any news on this? Pygments is particularly bad in highlighting Modern C++ ...
Thanks!
@h-2 I recommend you try my new package Shiki LaTeX. It’s compatible with minted and replaces Pygments with Shiki, which probably produces better results for C++ (though I haven’t tested).
@leafac Thanks! I gave it a try, but it's not working yet for me.
https://github.com/leafac/shiki-latex/issues/1#issuecomment-598752546
Still would love to see custom lexer support. Currently I do need to patch the installed pygments library manually, to easily get a fix I need in some lexer.
Defining a custiom one (and custom language) may be better.
By adding two new options
-
custom
(no value) and -
formatter=<formatter>
(default value islatex
)
to minted
, the following example allows inclusion of both custom lexer and formatter.
\begin{filecontents}[noheader,force]{your_lexer.py}
from pygments.lexers.markup import TexLexer
from pygments.token import Keyword
class TexLexer2(TexLexer):
"""
Improved lexer for the TeX and LaTeX typesetting languages.
Character "@" is treated part of command names.
"""
TexLexer.tokens['root'][4] = (r'\\([a-zA-Z@]+|.)', Keyword, 'command')
\end{filecontents}
\documentclass{article}
\usepackage{minted}
\usepackage{regexpatch}
\makeatletter
\newcommand{\minted@def@optcl@novalue}[2]{%
\define@key{minted@opt@g}{#1}[]{%
\minted@addto@optlistcl{\minted@optlistcl@g}{#2}%
\@namedef{minted@opt@g:#1}{#2}}%
\define@key{minted@opt@g@i}{#1}[]{%
\minted@addto@optlistcl{\minted@optlistcl@g@i}{#2}%
\@namedef{minted@opt@g@i:#1}{#2}}%
\define@key{minted@opt@lang}{#1}[]{%
\minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang}{#2}%
\@namedef{minted@opt@lang\minted@lang:#1}{#2}}%
\define@key{minted@opt@lang@i}{#1}[]{%
\minted@addto@optlistcl@lang{%
minted@optlistcl@lang\minted@lang @i}{#2}%
\@namedef{minted@opt@lang\minted@lang @i:#1}{#2}}%
\define@key{minted@opt@cmd}{#1}[]{%
\minted@addto@optlistcl{\minted@optlistcl@cmd}{#2}%
\@namedef{minted@opt@cmd:#1}{#2}}%
}
% new minted option "custom" for adding command line option "-x"
\minted@def@optcl@novalue{custom}{-x}
% new minted option "formatter=<formatter>" for specifying pygments formatter
\minted@def@opt{formatter}
% apply "-f <formatter>"
\newcommand\minted@formatter{%
\minted@get@opt{formatter}{latex}\space
}
\xpatchcmd*\minted@checkstyle
{-f latex }
{-f \minted@formatter}
{}{\fail}
\xpatchcmd*\minted@pygmentize
{-f latex }
{-f \minted@formatter}
{}{\fail}
\makeatother
\begin{document}
\parindent=0pt
\begin{Verbatim}[gobble=2]
Usage:
\begin{minted}[formatter=<formatter>]{<lexer>}
\end{minted}
Use option "custom" to allow custom <formatter> and/or <lexer>.
\end{Verbatim}
\subsection*{Test different formatters}
Default formatter \verb|tex|
\begin{minted}{tex}
abc \emph{text} $a + b = c$
\end{minted}
Formatter \verb|html|\par
\begin{minted}[formatter=html]{tex}
abc \emph{text} $a + b = c$
\end{minted}
\subsection*{Test custom lexer}
Default lexer \verb|-l latex|
\begin{minted}[custom]{tex}
\define@key{minted@opt@g}{#1}[]{%
\minted@addto@optlistcl{\minted@optlistcl@g}{#2}%
\@namedef{minted@opt@g:#1}{#2}}
\end{minted}
Custom lexer, \verb|@| is part of command names
\begin{minted}[custom]{your_lexer.py:TexLexer2}
\define@key{minted@opt@g}{#1}[]{%
\minted@addto@optlistcl{\minted@optlistcl@g}{#2}%
\@namedef{minted@opt@g:#1}{#2}}
\end{minted}
\end{document}
I’ll test this as soon as I can, do you plan to release a new version with that?
@muzimuzhi Your trick is great :), it would be nice to adapt a custom lexer something for LaTeX3 as proposed here (https://www.alanshawn.com/tech/2020/05/25/latex-3.html#hilighting-latex3-code).
Not sure if this is useful for anyone, but I thought I'd post my workaround here in the meantime. Basically the same as the original post, except that I parse the arguments and replace the custom lexer arguments.
First the script (pygmentize_local
)
#!/bin/env bash
arguments=()
while [[ "$#" -ne 0 ]]; do
case "$1" in
-l)
arguments+=("$1")
shift
lexer=$1
if [[ "$lexer" == "mycpp" ]]; then
arguments+=("./lexers/my_cpp.py:MyCppLexer")
arguments+=("-x")
else
arguments+=("${lexer}")
fi
;;
*)
arguments+=("$1")
;;
esac
shift
done
pygmentize "${arguments[@]}"
Then the lexers/my_cpp.py
file is
from pygments.lexers.c_cpp import CppLexer
from pygments.token import Name, string_to_tokentype
class MyCppLexer(CppLexer):
name = 'MyCpp'
aliases = ['mycpp']
filenames = ['*.cpp']
EXTRA_CLASSES = ['string', 'vector']
EXTRA_NAMESPACES = ['std']
def get_tokens_unprocessed(self, text):
for index, token, value in CppLexer.get_tokens_unprocessed(self, text):
if token is Name and value in self.EXTRA_CLASSES:
yield index, string_to_tokentype("Name.Class"), value
if token is Name and value in self.EXTRA_NAMESPACES:
yield index, string_to_tokentype("Keyword.Namespace"), value
else:
yield index, token, value
Then finally replace the command in your tex document
\renewcommand{\MintedPygmentize}{./pygmentize_local}
Not sure if this is useful for anyone, but I thought I'd post my workaround here in the meantime. Basically the same as the original post, except that I parse the arguments and replace the custom lexer arguments.
That's really useful @Irubataru. One simplification is that you can put everything in one file (e.g. pygmentize.py
) as follows:
#! /usr/bin/env python
import argparse
import sys
import pygments.cmdline as _cmdline
import pygments.lexer as _lexer
import pygments.token as _token
def main(args):
parser = argparse.ArgumentParser()
parser.add_argument('-l', dest='lexer', type=str)
opts, rest = parser.parse_known_args(args[1:])
if opts.lexer == 'ssh_config':
args = [__file__, '-l', __file__ + ':SSHConfigLexer', '-x', *rest]
_cmdline.main(args)
class SSHConfigLexer(_lexer.RegexLexer):
name = 'ssh_config'
tokens = {
'root': [
(r'(\s*Host)( .*\n)', _lexer.bygroups(
_token.Keyword, _token.String)),
(r'(\s*\w*)( ?.*\n)', _lexer.bygroups(
_token.Name.Attribute, _token.String)),
],
}
if __name__ == '__main__':
main(sys.argv)
and then
\renewcommand{\MintedPygmentize}{./pygmentize.py}
Not sure if this is useful for anyone, but I thought I'd post my workaround here in the meantime. Basically the same as the original post, except that I parse the arguments and replace the custom lexer arguments.
That's really useful @Irubataru. One simplification is that you can put everything in one file (e.g.
pygmentize.py
) as follows:\renewcommand{\MintedPygmentize}{./pygmentize.py}
Just so I don't forget this again (I've forgotten this every time I come back to this code): you need to chmod +x pygmentize.py
(otherwise pdflatex will continue to complain about you not having pygmentize
in your venv.
@muzimuzhi Your solution is great! Thank you very much for that! :)
I have added another condition to support both TeXLive 2022 and current versions of minted
at the same time.
In order to use it, do the following:
- Copy the following contents into a file called e.g.,
mintedfix.tex
(not actually needed, but quite clean this way):\makeatletter \ifdefined\minted@optlistcl@quote \ifwindows \renewcommand{\minted@optlistcl@quote}[2]{% \ifstrempty{#2}{\detokenize{#1}}{\detokenize{#1="#2"}}} \else \renewcommand{\minted@optlistcl@quote}[2]{% \ifstrempty{#2}{\detokenize{#1}}{\detokenize{#1='#2'}}} \fi \fi % similar to \minted@def@optcl@switch \newcommand{\minted@def@optcl@novalue}[2]{% \define@booleankey{minted@opt@g}{#1}% {\minted@addto@optlistcl{\minted@optlistcl@g}{#2}{}% \@namedef{minted@opt@g:#1}{true}} {\@namedef{minted@opt@g:#1}{false}} \define@booleankey{minted@opt@g@i}{#1}% {\minted@addto@optlistcl{\minted@optlistcl@g@i}{#2}{}% \@namedef{minted@opt@g@i:#1}{true}} {\@namedef{minted@opt@g@i:#1}{false}} \define@booleankey{minted@opt@lang}{#1}% {\minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang}{#2}{}% \@namedef{minted@opt@lang\minted@lang:#1}{true}} {\@namedef{minted@opt@lang\minted@lang:#1}{false}} \define@booleankey{minted@opt@lang@i}{#1}% {\minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang @i}{#2}{}% \@namedef{minted@opt@lang\minted@lang @i:#1}{true}} {\@namedef{minted@opt@lang\minted@lang @i:#1}{false}} \define@booleankey{minted@opt@cmd}{#1}% {\minted@addto@optlistcl{\minted@optlistcl@cmd}{#2}{}% \@namedef{minted@opt@cmd:#1}{true}} {\@namedef{minted@opt@cmd:#1}{false}} } \minted@def@optcl@novalue{custom lexer}{-x} \makeatother
- In your main LaTeX file, load this file after
minted
:[...] \usepackage{minted} \input{mintedfix} [...]
minted
version 3.0 is now under development, thanks to a grant from the TeX Users Group. It will include official support for custom lexers. It will also be able to be extended using Python, not just LaTeX macro programming, which will make possible many new lexer-related features. Progress on custom lexers will be tracked in #372. Initial beta releases of minted
version 3.0 are expected by early 2024.