The ``code-block`` directive replaces all (raw) horizontal tabulation characters with spaces, hence the displayed code can differ from source, even ignoring highlighting
Describe the bug
Consider this source using raw horizontal tabulation character ascii 09 (TAB char).
.. code-block::
"23456781234567812345678123456781234567812345678"
mystring = r"a ab abc abcd abcde bar"
(take into account the github markdown uses tab stops every 4 spaces, this was done in an editor using 8 spaces per tab).
Then the HTML will use spaces, not tabs in <pre>...</pre>.
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="s2">"23456781234567812345678123456781234567812345678"</span>
<span class="n">mystring</span> <span class="o">=</span> <span class="sa">r</span><span class="s2">"a ab abc abcd abcde bar"</span>
</pre></div>
This contains no tabulation characters. So copying-pasting does not allow to recover the original source code.
How to Reproduce
Use above in index.rst and excute make html.
Environment Information
Platform: darwin; (macOS-15.7.2-arm64-arm-64bit-Mach-O)
Python version: 3.13.3 (v3.13.3:6280bb54784, Apr 8 2025, 10:47:54) [Clang 15.0.0 (clang-1500.3.9.4)])
Python implementation: CPython
Sphinx version: 8.3.0+/5d0ad1686
Docutils version: 0.22.4b1.dev
Jinja2 version: 3.1.6
Pygments version: 2.19.1
Sphinx extensions
Additional context
On the other hand, literalinclude preserves horizontal tab characters.
Originally from this comment.
I presume this is a well-known fact, which simply illustrates the utility of attachment of the true source code.
This is a known and documented limitation of the reStructuredText format.
Tabs will be converted to spaces. Tab stops are at every 8th column (processing systems may make this value configurable, Docutils uses the tab_width configuration setting). — https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#whitespace
The workaround for formal languages that mandate the use of TABs is to use an external file for the code samples and include it with
the include directive using the literal or code option together with the
tab-width option set to a negative value, e.g.:
.. include:: Makefile
:code: make
:tab-width: -8
:start-line: 24
:end-line: 42
There are a related Docutils feature request and a thread in the mail archive.
It turned out to be a complex problem to get this right for the general case of indentation with mixed TABs and SPACEs. The proposed patch set works in some well-behaved cases but fails in other valid use cases.
The workaround for formal languages that mandate the use of TABs is to use an external file for the code samples and include it with the include https://docutils.sourceforge.io/docs/ref/rst/directives.html#include directive using the literal or code option together with the tab-width option https://docutils.sourceforge.io/docs/ref/rst/directives.html#include-options set to a negative value, e.g.:
I read the linked-to documentation and I can't tell if using -1, -4 or -8 has any significance?
@gmilde Thanks for the links to pre-existing discussion at Docutils.
The present problem thus applies to all builders.
Regarding the PDF format, there appears to exist grave difficulties with rendering TABs (and spaces). I am currently unaware of any working method to produce PDFs via LaTeX from which one can copy-paste code listings in a manner which will recover horizontal tabulations as ascii 09 characters and paste them as such in a text editor. I experimented a bit with accsupp LaTeX package and never got any working method (even wasted time handling lies of an AI), even while using Adobe Acrobat (did not try the Pro version which I don't own).
One can "attach" files to a PDF, and I have some experience with LaTeX package attachfile but did encounter various limitations with it in the past related mainly to what PDF viewers actually will display or not.
Nevertheless "attaching" the source code to PDFs looks like the sole working manner to allow working copy-paste. These things are well-known I guess, but are nevertheless a very strong limitation of PDF, which current "accessibility" effort of LaTeX do not seem to solve yet.
I read the linked-to documentation and I can't tell if using
-1,-4or-8has any significance?
No, all negative values just keep the TAB as TAB.