sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

LaTeX: with ``literalinclude`` directives, TABs are converted in PDF into 8 spaces, independently of their location

Open jfbu opened this issue 3 months ago • 2 comments

Describe the bug

Tabulation characters from files which get inserted using literalinclude end up "as is" both in the .html and in the .tex file produced by the respective builders.

In the PDF they get converted to 8 spaces, independently of their location.

This is in contrast with viewing the HTML where at least with my Firefox the rendering uses tab stops every 8 characters. Note though that I have found no tab-size setting for <pre> in the CSS files (when building the MWE below) so this may be browser dependent. Indeed, I am ignorant if there is a universal convention that in absence of tab-size, tab stops inside <pre> are every 8 characters.

Important: here we do not use :tab-width: option. If using that option tabs are converted to spaces both with LaTeX and HTML and the outputs match one another with tab stops matching the option (only tested with :tab-width: 8).

How to Reproduce

.. literalinclude:: filewithtabs.txt

with filewithtabs.txt being

	============================
	LINUX KERNEL MEMORY BARRIERS
	============================
hello	where	are	little	tabs	gone?

PDF output:

Image

HTML:

Image

Environment Information

Platform:              darwin; (macOS-15.7.2-arm64-arm-64bit-Mach-O)
Python version:        3.13.3 (v3.13.3:6280bb54784, Apr  8 2025, 10:47:54) [Clang 15.0.0 (clang-1500.3.9.4)])
Python implementation: CPython
Sphinx version:        8.3.0+/5d0ad1686
Docutils version:      0.22.4b1.dev
Jinja2 version:        3.1.6
Pygments version:      2.19.1

Sphinx extensions


Additional context

Tested with Docutils 0.22.3 and 0.21.2.

This issue was originally reported at https://github.com/sphinx-doc/sphinx/issues/13656#issuecomment-3538318789 by @akiyks.

Note that as work-around it is enough to add :tab-width: 8 option to the literalinclude directive. Then the .tex file (and also .html) will contain only spaces, matching tab-stops at each multiple of 8 characters.

jfbu avatar Nov 16 '25 08:11 jfbu

I have made preliminary attempt. But the comments at various locations of sphinxlatexliterals.sty, such as

% MEMO: fancyvrb has options obeytabs and tabsize.  Anyhow tab characters
% do not make it to the tex file, they have been converted to spaces earlier.
% But, if this was not the case, the support would be implemented here via
%     \FV@ObeyTabs{\strut\spx@verb@FV@Line\strut}%
% And one would need a similar change in the measuring phase done by
% \spx@verb@DecideIfWillDoForceWrap

are overly optimistic.

Following what they say does work (LaTeX template must do \fvset{obeytabs}) for simple cases but breaks the mechanism we have to wrap long codelines. For the "hard-wrap" optional feature (verbatimforcewraps) perspectives are even more clouded.

The way fancyvrb.sty implements the feature is intimately tied with the fact it uses only TeX horizontal boxes, but we at Sphinx initially render the codeline in a vertical box so that it can split naturally if too long. And extending the fancyvrb.sty method to work here without breaking the wrapping of long lines and the injection of continuation symbols looks challenging.

A much easier and saner way would be a Python level only solution doing a replacement of the tab characters to spaces beforehand, as is actually what happens for the contents of code-blocks.

The comments about \FV@ObeyTabs in sphinxlatexliterals.sty should be removed.

edit: the above comment in \def\spx@verb@@PreProcessLine is plain wrong because \FV@ObeyTabs puts everything in a horizontal box, so using it as in the comment breaks Sphinx feature of wrapping long codelines. However if one modifies the associated core fancyvrb internal \FV@@ObeyTabs to use a \unhbox then we recover out feature of wrapping codelines.

EDIT: no, tabulations inside Python strings for example, which are highlighted by Pygments are plain and simple incompatible. In fact there is an upstream bug of fancyvrb.sty which I have discovered experimenting this.

Here is pure LaTeX file showing the bug:

\documentclass{article}
\usepackage{fancyvrb}
\fvset{obeytabs}
\begin{document}
This is OK:
\begin{Verbatim}[commandchars=\\\{\}]
  \textbf{foo bar}
\end{Verbatim}

This (with a TAB between foo and bar) is again OK
\begin{Verbatim}[commandchars=\\\{\}]
  foo	bar
\end{Verbatim}

But with this (again a TAB between foo and bar), everything after foo disappears:
\begin{Verbatim}[commandchars=\\\{\}]
  \textbf{foo	bar}
\end{Verbatim}
\end{document}

CONCLUSION

Due to the fact that the way fancyvrb implements the obeytabs feature is intrinsically incompatible with commandchars=\\\{\} which we absolutely need for Pygments mark-up, (see EDIT above) there is no alternative but to have TABs converted to spaces before being shipped to the .tex file.

jfbu avatar Nov 16 '25 09:11 jfbu

CONCLUSION

Due to the fact that the way fancyvrb implements the obeytabs feature is intrinsically incompatible with commandchars=\\\{\} which we absolutely need for Pygments mark-up, (see EDIT above) there is no alternative but to have TABs converted to spaces before being shipped to the .tex file.

However when displaying source code from literalinclude we do not want such replacement because for example astring = "foo<tab here>bar" should show real TAB. Here the TAB should display ideally simply as itself, which is neither obeying tab stops, nor using some symbol ('showtabs' option of fancyvrb.sty).

I will probably update a work-in-progress branch in order for \PYG mark-up to do something with tabulation characters so that perhaps some solution is found. It may be possible to display some symbol and have copy-paste recover a 0x09 ascii tab from accsupp package but I am not sure.

jfbu avatar Nov 16 '25 11:11 jfbu