sphinx
sphinx copied to clipboard
Latex file is not updated when using the option numfig = True
Describe the bug
Latex file is not updated when using the option numfig = True.
How to Reproduce
The bug can be reproduced with the following bash code:
# generate a default project
sphinx-quickstart --no-sep -p "myproject" -a test -r "" -l en
# add numfig = True option to conf.py
echo -e "\nnumfig=True" >> conf.py
# build latex & print main latex file modification time
make latex && ls _build/latex/myproject.tex --full-time
# now add a change to index.rst
echo -e "\n*new change*" >> index.rst
# build latex & print main latex file modification time again
make latex && ls _build/latex/myproject.tex --full-time
In the code above, after updating index.rst, when running make latex && ls _build/latex/myproject.tex --full-time
the file myproject.tex
is not updated.
The output of make latex
in the terminal seems normal (I see no difference with respect to running without setting numfig = True, just that the tex file is not updated),
The workaround for this is running sphinx-build the -E option (don't use a saved environment): sphinx-build -E -b latex . _build/latex/
Environment Information
Platform: linux; (Linux-6.2.14-300.fc38.x86_64-x86_64-with-glibc2.37)
Python version: 3.11.3 (main, Apr 5 2023, 00:00:00) [GCC 13.0.1 20230401 (Red Hat 13.0.1-0)])
Python implementation: CPython
Sphinx version: 7.0.1
Docutils version: 0.19
Jinja2 version: 3.1.2
Pygments version: 2.14.0
Sphinx extensions
No response
Additional context
No response
We have noticed the same problem when upgrading from sphinx 3.2.1 (the problem was not present) to 7.0.0.
Since the steps to reproduce are exact and there exists working/non-working pair of releases, it should be possible to use git bisect or something alike to narrow down to the change, where the problem was introduced. From there, it is usually easy to resolve it.
I ran git bisect
as suggested by @zoranbosnjak, I used master as the good commit and v5.0.0 as the bad. I used the following code to run it automatically:
#!/bin/bash
# setup virtual environment and install sphinx
python -m venv .venv
source .venv/bin/activate
pip install -e .
mkdir .venv/test
cd .venv/test
# generate a default project
sphinx-quickstart --no-sep -p "myproject" -a test -r "" -l en
# add numfig = True option to conf.py
echo -e "\nnumfig=True" >> conf.py
# build latex & get latex file modification time
make latex
file_modification_time_1="$(stat _build/latex/myproject.tex --printf=%y)"
# add a change to index.rst
echo -e "\n*new change*" >> index.rst
# build latex & get main latex file modification time again
make latex
file_modification_time_2="$(stat _build/latex/myproject.tex --printf=%y)"
# clean virtual environment
deactivate
cd ../..
rm -rf .venv
# print result
if [ "$file_modification_time_1" == "$file_modification_time_2" ]; then
exit 1 # bad
else
exit 0 # good
fi
The result was:
a9b0f2708b67a355c5f4969ccbbe9bd695518805 is the first bad commit
commit a9b0f2708b67a355c5f4969ccbbe9bd695518805
Author: Adam Turner <[email protected]>
Date: Wed Jan 4 04:22:20 2023 +0000
Cache doctrees more efficiently
sphinx/environment/__init__.py | 23 ++++++++++++++---------
sphinx/environment/adapters/toctree.py | 2 +-
sphinx/testing/util.py | 4 ++++
tests/test_build_gettext.py | 3 ++-
tests/test_build_html.py | 2 +-
tests/test_build_latex.py | 2 +-
tests/test_ext_math.py | 4 ++--
7 files changed, 25 insertions(+), 15 deletions(-)
bisect found first bad commit
That commit introduced some changes related to the reading of the toctree from cache, but it is not clear to me where exactly is the bug.
@AA-Turner, since commit a9b0f2708b67a355c5f4969ccbbe9bd695518805, the make latex
does not detect change in .rst
input files (when the numfig = True
in the configuration) and so the documentation project is not recompiled correctly. Could you please check?
Also suggesting to add/extend a testcase, to cover this scenario (make latex, change, make latex again... check by the timestamp if the output is updated).
Hi, I dug into this and actually uncovered what's causing the issue (at least I think so):
After all documents have been read, the Builder.build
method is invoking the BuildEnvironment.check_dependents
methods.
This in turn emits the env-get-updated
event that causes the EnvironmentCollector.get_updated_docs
to be called. The implementation of this method in TocTreeCollector
is then calling a assign_figure_numbers
method which does nothing except when setting numfig = True
:
Then _walk_doc
is calling BuildEnvironment.get_doctree
on EVERY file in the worktree.
So every single doctree is read into a bytes array and stored in the _pickled_doctree_cache
.
The real issue now is that all of this happens before the environment is pickled and written to disk for further incremental builds:
So the environment gets written to a file while it contains the data of all doctrees in the _pickled_doctree_cache
map.
(this was actually what brought me onto track here, as my environment.pickle
file was far too huge).
Now during the next incremental build, the environment gets unpickled while still having cached all the existing doctree files in the worktree!
As _pickled_doctree_cache
is not meant to be pickled and reloaded, this essentially breaks the update build.
An easy fix is to prevent it from being pickled in the BuildEnvironment.__getstate__
method:
On top of that, the issue is even harder to spot as it can be obscured by the _write_doc_doctree_cache
which is used later in the write
stage of the build before falling back to the _pickled_doctree_cache
.
The _write_doc_doctree_cache
is only filled if the reading of the source files is done in serial.
This is always the case if the number of changed source files <= 5.
So the bug can only be really spotted if you've enabled parallel build AND more then 5 of your source files have changes.
@AA-Turner Can you verify this analysis? :)
Oh, I indeed think that the _pickled_doctree_cache
attribute should not be present in the serialized environment since whenever we are querying get_doctree
we are getting a fresh doctree. Actually, what happens is that we are never loading the 'latest' pickled file (if you print what is being deserialized you'll just after check_dependents, you'll see that the documents are not those you'd expect).