latex3 icon indicating copy to clipboard operation
latex3 copied to clipboard

Document catcodes for xparse's "verbatim" argument type, document how to reproduce \verb

Open dbitouze opened this issue 4 years ago • 46 comments

When fontenc is loaded with its T1 option, \NewDocumentCommand with verbatim argument gobbles the first - if its content contains a -- (irrespective of the delimiters used):

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{xparse}
\NewDocumentCommand {\myverb} { v } {#1}
\begin{document}
\ttfamily
\verb|--all|

\myverb{-all}

\myverb{--all}

\myverb{---all}
\end{document}

image

dbitouze avatar Jun 24 '20 22:06 dbitouze

What you are seeing is the -- ligature in the typewriter font. If you write \texttt{--} you'll also see a single dash, but if you copy from the PDF, you'll see that it's indeed an en-dash. You can check that by feeding the grabbed argument to \showtokens or by using \@noligs (LaTeX uses that in \verb to have -- print --):

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{xparse}
\makeatletter
% \NewDocumentCommand {\myverb} { v } { \showtokens{#1} }
\NewDocumentCommand {\myverb} { v } {#1}
\begin{document}
\makeatletter
\ttfamily \@noligs
-- and \verb|--all|

- and \myverb{-all}

-- and \myverb{--all}

--- and \myverb{---all}
\end{document}

test

PhelypeOleinik avatar Jun 24 '20 22:06 PhelypeOleinik

but v in xparse is supposed to be verbatim (is it not ?) and in LaTeX that means typewriter with ligatures suppressed so v should do that too in my opinion.

FrankMittelbach avatar Jun 24 '20 23:06 FrankMittelbach

It just grabs verbatim (verbatim here being some equivalent of \let\do\@makeother \dospecials). \@noligs could be added in the catcode setup for scanning the argument. On the other hand, this would insert active tokens where (theoretically) there are only be catcode-other tokens, so in case the argument is used for something other than typesetting, it could be problematic.

Perhaps some way to allow the command to add its own catcode settings, like:

\NewDocumentCommand {\myverb} { v{\@noligs} } {#1}

PhelypeOleinik avatar Jun 24 '20 23:06 PhelypeOleinik

@FrankMittelbach I agree with PhelypeOleinik that “verbatim” means “grab whatever user has written verbatim”. The <hyphen hyphen> to <endash> ligature is more a “font feature” than a “argument-grabbing bug”. Also, \ttfamily does not mean “monospaced font = no ligature whatsoever”. Some monospaced typefaces can be used as body type (not just code), so the hyphen ligatures should not be suppressed in such cases.

RuixiZhang42 avatar Jun 24 '20 23:06 RuixiZhang42

But isn't \NewDocumentCommand {\myverb} { v } {#1}\myverb{--all} supposed to behave as \verb|--all|?

dbitouze avatar Jun 25 '20 07:06 dbitouze

@dbitouze — not really. \verb has two parts to it — argument grabbing and formatting. The “v” setting in \NewDocumentCommand only does the former.

@Phelype — unless I’m missing something, doesn’t your suggestion do no more than this?:

\NewDocumentCommand {\myverb} { v } { {\@noligs #1} }

In that case I don’t think it is necessary. So back to @dbitouze, the way to replicate \verb is something like:

\makeatletter \NewDocumentCommand {\myverb} { v } { {@noligs\ttfamily #1} } \makeatother

wspr avatar Jun 25 '20 07:06 wspr

On Thu, 25 Jun 2020 at 08:22, Denis Bitouzé [email protected] wrote:

But isn't \NewDocumentCommand {\myverb} { v } {#1}\myverb{--all} supposed to behave as \verb|--all|?

Not exactly, as v is just about parsing the argument, and that is read verbatim, \verb also typesets the content in a non standard monospace font setup that suppresses ligatures. so rather than just #1 to typeset the argument in the current font you'd need to do

\verbatim@font\@noligs
\language\l@nohyphenation

except that @noligs requires \def\verbatim@nolig@list{\do`\do<\do>\do,\do'\do-} to be active so we could consider either making v set those as active. or providing a wrapper around scantokens that arranges that @noligs can work here

You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/latex3/latex3/issues/756#issuecomment-649302149, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVYAVLBB4ABB3DD5TETRDRYL3MPANCNFSM4OHMH74A .

davidcarlisle avatar Jun 25 '20 07:06 davidcarlisle

@dbitouze no, the similarity is only in the way the argument can be delimited: you can use \myverb!abc!. The result is documented as

which will result in the grabbed argument consisting of tokens of category codes 12 (“other”) and 13 (“active”), except spaces, which are given category code 10 (“space”).

The argument parser only reads an argument, it doesn't typeset it. And it would make no sense to add font commands or other commands to it or even to preprocess it to apply \@noligs by default: There are other ways to suppress ligatures. With luatex one would apply perhaps Ligatures=Resetall and with pdflatex one could use \pdfnoligatures with a slightly different font:

\RequirePackage{fix-cm}
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{xfp,xparse}

\makeatletter
\NewDocumentCommand {\myverb} { v } {{\fontsize{\fpeval{\f@size+0.0001}}{\normalbaselineskip}\selectfont\pdfnoligatures\font #1}}
\makeatother

\begin{document}
--all

\verb|--all|

\myverb{-all}

\myverb{--all}

\myverb{---all}

\footnotesize
--all \myverb{--all}


\ttfamily
--all

\verb|--all|

\myverb{-all}

\myverb{--all}

\myverb{---all}

\footnotesize
--all \myverb{--all}

\end{document}

u-fischer avatar Jun 25 '20 07:06 u-fischer

@phelype — unless I’m missing something, doesn’t your suggestion do no more than this?: \NewDocumentCommand {\myverb} { v } { {@noligs #1} }

@wspr Kind of, but no: \@noligs changes the catcode of - (and a bunch others) to 13, and then define it as \def-{\leavevmode\kern\z@\char`-}: being a catcode change, it has to be done before the argument is grabbed (unless we are considering \scantokens), thus my suggestion to allow a “catcode setup” argument to v (though it would have to be optional: \NewDocumentCommand {\myverb} { v[\@noligs] } {#1}).

PhelypeOleinik avatar Jun 25 '20 11:06 PhelypeOleinik

Thanks @phelype — it’s been a while since I looked inside that macro :) In that case I like the idea of the setup argument… even if in this instance other approaches can also work to disable the ligatures.

wspr avatar Jun 25 '20 12:06 wspr

We don't have optional data in the arg spec, so it would need a new letter (w?)

josephwright avatar Jun 25 '20 12:06 josephwright

Or a breaking change to v-type

josephwright avatar Jun 25 '20 12:06 josephwright

I would rather vote for V (matching that we have o and O and d and D) then to consider a breaking change.

FrankMittelbach avatar Jun 25 '20 12:06 FrankMittelbach

We don't have optional data in the arg spec, so it would need a new letter (w?)

Can't we add one?

Or perhaps, since we have o and O{}, it seems natural to have v and V{}. Of course the argument would mean different things...

PhelypeOleinik avatar Jun 25 '20 12:06 PhelypeOleinik

Imho if the catcodes should be customizable for the v-type it would make sense to use the cctab code, and not some arbitrary command like \@noligs. Then the reading of the command would only set the catcodes and definitions of active chars should then be done in the macro body.

u-fischer avatar Jun 25 '20 12:06 u-fischer

@u-fischer So I better get that PR for l3cctab in ...

josephwright avatar Jun 25 '20 12:06 josephwright

Currently, I've no idea on how l3cctab works and how it could be helpful for the current issue but I am really interested :)

dbitouze avatar Jun 25 '20 13:06 dbitouze

My point was that semantically v is 'verbatim' whereas what's needed here is not. Importantly, you have to worry if the delimiting chars are altered by the catcode table or whatever. Also, we've been consistent that uppercase letters -> some optional-arg variant of a lowercase one. So I'd say something like c{<table>} (= 'catcode') would be right.

josephwright avatar Jun 25 '20 14:06 josephwright

I'll get the cctab stuff sorted today or tomorrow if I can, so we can discuss.

josephwright avatar Jun 25 '20 14:06 josephwright

@dbitouze A catcode table is a way of having a 'fixed' set of catcode for all chars(*). It means you get a one-token interface for the changes, so '\c_document_cctab for normal catcodes, \c_initex_cctab for IniTeX, etc. The idea is this is a lot clearer and more reliable than one-by-one setting.

  • In XeTeX, we don't have the necessary primitive, so I can only cover chars 0 to 255 with reasonable performance.

josephwright avatar Jun 25 '20 14:06 josephwright

@josephwright (off topic) It seems to me that you wanted to add a footnote but Markdown didn’t know that.

joulev avatar Jun 25 '20 15:06 joulev

Suppressing a known set of ligatures during output can also be done by using \tl_replace_all:Nnn and replacing the problematic character with something that won't form the ligature.

Skillmon avatar Jun 25 '20 17:06 Skillmon

@Skillmon Good point: one could take the verbatim material and replace tokens. As everything is strictly verbatim, that's probably an easier approach than worrying about catcode setup.

josephwright avatar Jun 26 '20 09:06 josephwright

Regardless of the outcome of this discussion, it will be useful to document in xparse.pdf how to reproduce the behaviour of \verb using \NewDocumentCommand.

blefloch avatar Jun 26 '20 15:06 blefloch

@josephwright depending on the number of tokens to be replaced the performance will be a lot worse with the \tl_replace_all:Nnn approach.

Skillmon avatar Jun 26 '20 21:06 Skillmon

Also, how will you know which characters (in a very large font) need replacing?

Further: what does typesetting ‘verbatim with a monospaced font’ mean for many scripts (non-European)?

car222222 avatar Jun 27 '20 04:06 car222222

@car222222 in a very large font you got the font features as the only reasonable way to suppress them all, LaTeX can't know about all ligatures possible in a font. But at least the characters supported LaTeX2e could be covered easily (it's just a \tl_map_function:NN and \tl_replace_all:Nnn).

Further: AFAIK there are double spaced symbols in some monospaced fonts for some non-European scripts.

Skillmon avatar Jun 27 '20 07:06 Skillmon

I would suggest just wrapping every character in an hbox. It seems to work reasonably well, but I didn't test extensively.

\RequirePackage{xparse} \ExplSyntaxOn \NewDocumentCommand{\myverb}{v}{\texttt{\str_map_function:nN{#1}\hbox:n}} \ExplSyntaxOff \documentclass{article} \usepackage[T1]{fontenc} \begin{document} \verb|a--b ---c ``<''|

\myverb|a--b ---c ``<''| \end{document}

blefloch avatar Jun 27 '20 13:06 blefloch

I would suggest just wrapping every character in an hbox. It seems to work reasonably well, but I didn't test extensively.

Try with |a--bgrüße ---c ``<''|

u-fischer avatar Jun 27 '20 13:06 u-fischer

Ok, second attempt (the v arg keeps active chars as is): insert \kern 0pt\relax before all non-active chars.

\RequirePackage{xparse}
\ExplSyntaxOn
\tl_new:N \l__myverb_tl
\cs_new:Npn \__myverb:n #1
  {
    \token_if_active:NF #1 { \kern 0pt\relax }
    \exp_not:n {#1}
  }
\NewDocumentCommand { \myverb } { v }
  {
    \tl_set:Nn \l__myverb_tl {#1}
    \tl_replace_all:Nnn \l__myverb_tl { ~ } { { ~ } }
    \group_begin:
      \use:c { verbatim@font }
      \use:x { \tl_map_function:NN \l__myverb_tl \__myverb:n }
    \group_end:
  }
\ExplSyntaxOff
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\begin{document}
\verb|a--bgrüße ----c ``<''|

\myverb|a--bgrüße ----c ``<''|
\end{document}

blefloch avatar Jun 27 '20 14:06 blefloch