sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

The ``code-block`` directive replaces all (raw) horizontal tabulation characters with spaces, hence the displayed code can differ from source, even ignoring highlighting

Open jfbu opened this issue 3 months ago • 4 comments

Describe the bug

Consider this source using raw horizontal tabulation character ascii 09 (TAB char).

.. code-block::

   "23456781234567812345678123456781234567812345678"
   mystring = r"a	ab	abc	abcd	abcde	bar"

(take into account the github markdown uses tab stops every 4 spaces, this was done in an editor using 8 spaces per tab).

Then the HTML will use spaces, not tabs in <pre>...</pre>.

<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="s2">&quot;23456781234567812345678123456781234567812345678&quot;</span>
<span class="n">mystring</span> <span class="o">=</span> <span class="sa">r</span><span class="s2">&quot;a       ab      abc     abcd    abcde   bar&quot;</span>
</pre></div>

This contains no tabulation characters. So copying-pasting does not allow to recover the original source code.

How to Reproduce

Use above in index.rst and excute make html.

Environment Information

Platform:              darwin; (macOS-15.7.2-arm64-arm-64bit-Mach-O)
Python version:        3.13.3 (v3.13.3:6280bb54784, Apr  8 2025, 10:47:54) [Clang 15.0.0 (clang-1500.3.9.4)])
Python implementation: CPython
Sphinx version:        8.3.0+/5d0ad1686
Docutils version:      0.22.4b1.dev
Jinja2 version:        3.1.6
Pygments version:      2.19.1

Sphinx extensions


Additional context

On the other hand, literalinclude preserves horizontal tab characters.

Originally from this comment.

I presume this is a well-known fact, which simply illustrates the utility of attachment of the true source code.

jfbu avatar Nov 16 '25 12:11 jfbu

This is a known and documented limitation of the reStructuredText format.

Tabs will be converted to spaces. Tab stops are at every 8th column (processing systems may make this value configurable, Docutils uses the tab_width configuration setting). — https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#whitespace

The workaround for formal languages that mandate the use of TABs is to use an external file for the code samples and include it with the include directive using the literal or code option together with the tab-width option set to a negative value, e.g.:

.. include:: Makefile
   :code: make
   :tab-width: -8
   :start-line: 24
   :end-line: 42

There are a related Docutils feature request and a thread in the mail archive.
It turned out to be a complex problem to get this right for the general case of indentation with mixed TABs and SPACEs. The proposed patch set works in some well-behaved cases but fails in other valid use cases.

gmilde avatar Nov 18 '25 13:11 gmilde

The workaround for formal languages that mandate the use of TABs is to use an external file for the code samples and include it with the include https://docutils.sourceforge.io/docs/ref/rst/directives.html#include directive using the literal or code option together with the tab-width option https://docutils.sourceforge.io/docs/ref/rst/directives.html#include-options set to a negative value, e.g.:

I read the linked-to documentation and I can't tell if using -1, -4 or -8 has any significance?

jfbu avatar Nov 18 '25 21:11 jfbu

@gmilde Thanks for the links to pre-existing discussion at Docutils.

The present problem thus applies to all builders.

Regarding the PDF format, there appears to exist grave difficulties with rendering TABs (and spaces). I am currently unaware of any working method to produce PDFs via LaTeX from which one can copy-paste code listings in a manner which will recover horizontal tabulations as ascii 09 characters and paste them as such in a text editor. I experimented a bit with accsupp LaTeX package and never got any working method (even wasted time handling lies of an AI), even while using Adobe Acrobat (did not try the Pro version which I don't own).

One can "attach" files to a PDF, and I have some experience with LaTeX package attachfile but did encounter various limitations with it in the past related mainly to what PDF viewers actually will display or not.

Nevertheless "attaching" the source code to PDFs looks like the sole working manner to allow working copy-paste. These things are well-known I guess, but are nevertheless a very strong limitation of PDF, which current "accessibility" effort of LaTeX do not seem to solve yet.

jfbu avatar Nov 18 '25 21:11 jfbu

I read the linked-to documentation and I can't tell if using -1, -4 or -8 has any significance?

No, all negative values just keep the TAB as TAB.

gmilde avatar Dec 02 '25 11:12 gmilde