sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

Latex file is not updated when using the option numfig = True

Open tombolano opened this issue 1 year ago • 5 comments

Describe the bug

Latex file is not updated when using the option numfig = True.

How to Reproduce

The bug can be reproduced with the following bash code:

# generate a default project
sphinx-quickstart --no-sep -p "myproject" -a test -r "" -l en
# add numfig = True option to conf.py
echo -e "\nnumfig=True" >> conf.py
# build latex & print main latex file modification time
make latex && ls _build/latex/myproject.tex --full-time
# now add a change to index.rst
echo -e "\n*new change*" >> index.rst
# build latex & print main latex file modification time again
make latex && ls _build/latex/myproject.tex --full-time

In the code above, after updating index.rst, when running make latex && ls _build/latex/myproject.tex --full-time the file myproject.tex is not updated.

The output of make latex in the terminal seems normal (I see no difference with respect to running without setting numfig = True, just that the tex file is not updated),

The workaround for this is running sphinx-build the -E option (don't use a saved environment): sphinx-build -E -b latex . _build/latex/

Environment Information

Platform:              linux; (Linux-6.2.14-300.fc38.x86_64-x86_64-with-glibc2.37)
Python version:        3.11.3 (main, Apr  5 2023, 00:00:00) [GCC 13.0.1 20230401 (Red Hat 13.0.1-0)])
Python implementation: CPython
Sphinx version:        7.0.1
Docutils version:      0.19
Jinja2 version:        3.1.2
Pygments version:      2.14.0

Sphinx extensions

No response

Additional context

No response

tombolano avatar Jun 28 '23 09:06 tombolano

We have noticed the same problem when upgrading from sphinx 3.2.1 (the problem was not present) to 7.0.0.

Since the steps to reproduce are exact and there exists working/non-working pair of releases, it should be possible to use git bisect or something alike to narrow down to the change, where the problem was introduced. From there, it is usually easy to resolve it.

zoranbosnjak avatar Jul 06 '23 10:07 zoranbosnjak

I ran git bisect as suggested by @zoranbosnjak, I used master as the good commit and v5.0.0 as the bad. I used the following code to run it automatically:

#!/bin/bash

# setup virtual environment and install sphinx
python -m venv .venv
source .venv/bin/activate
pip install -e .

mkdir .venv/test
cd .venv/test

# generate a default project
sphinx-quickstart --no-sep -p "myproject" -a test -r "" -l en

# add numfig = True option to conf.py
echo -e "\nnumfig=True" >> conf.py

# build latex & get latex file modification time
make latex
file_modification_time_1="$(stat _build/latex/myproject.tex --printf=%y)"

# add a change to index.rst
echo -e "\n*new change*" >> index.rst

# build latex & get main latex file modification time again
make latex
file_modification_time_2="$(stat _build/latex/myproject.tex --printf=%y)"

# clean virtual environment
deactivate
cd ../..
rm -rf .venv

# print result
if [ "$file_modification_time_1" == "$file_modification_time_2" ]; then
	exit 1 # bad
else
	exit 0 # good
fi

The result was:

a9b0f2708b67a355c5f4969ccbbe9bd695518805 is the first bad commit
commit a9b0f2708b67a355c5f4969ccbbe9bd695518805
Author: Adam Turner <[email protected]>
Date:   Wed Jan 4 04:22:20 2023 +0000

    Cache doctrees more efficiently

 sphinx/environment/__init__.py         | 23 ++++++++++++++---------
 sphinx/environment/adapters/toctree.py |  2 +-
 sphinx/testing/util.py                 |  4 ++++
 tests/test_build_gettext.py            |  3 ++-
 tests/test_build_html.py               |  2 +-
 tests/test_build_latex.py              |  2 +-
 tests/test_ext_math.py                 |  4 ++--
 7 files changed, 25 insertions(+), 15 deletions(-)
bisect found first bad commit

That commit introduced some changes related to the reading of the toctree from cache, but it is not clear to me where exactly is the bug.

tombolano avatar Jul 08 '23 16:07 tombolano

@AA-Turner, since commit a9b0f2708b67a355c5f4969ccbbe9bd695518805, the make latex does not detect change in .rst input files (when the numfig = True in the configuration) and so the documentation project is not recompiled correctly. Could you please check?

Also suggesting to add/extend a testcase, to cover this scenario (make latex, change, make latex again... check by the timestamp if the output is updated).

zoranbosnjak avatar Jul 09 '23 07:07 zoranbosnjak

Hi, I dug into this and actually uncovered what's causing the issue (at least I think so):

After all documents have been read, the Builder.build method is invoking the BuildEnvironment.check_dependents methods. image

This in turn emits the env-get-updated event that causes the EnvironmentCollector.get_updated_docs to be called. The implementation of this method in TocTreeCollector is then calling a assign_figure_numbers method which does nothing except when setting numfig = True: image

Then _walk_doc is calling BuildEnvironment.get_doctree on EVERY file in the worktree. So every single doctree is read into a bytes array and stored in the _pickled_doctree_cache. image

The real issue now is that all of this happens before the environment is pickled and written to disk for further incremental builds: image

So the environment gets written to a file while it contains the data of all doctrees in the _pickled_doctree_cache map. (this was actually what brought me onto track here, as my environment.pickle file was far too huge). Now during the next incremental build, the environment gets unpickled while still having cached all the existing doctree files in the worktree! As _pickled_doctree_cache is not meant to be pickled and reloaded, this essentially breaks the update build.

An easy fix is to prevent it from being pickled in the BuildEnvironment.__getstate__ method: image

On top of that, the issue is even harder to spot as it can be obscured by the _write_doc_doctree_cache which is used later in the write stage of the build before falling back to the _pickled_doctree_cache. The _write_doc_doctree_cache is only filled if the reading of the source files is done in serial. This is always the case if the number of changed source files <= 5. So the bug can only be really spotted if you've enabled parallel build AND more then 5 of your source files have changes.

@AA-Turner Can you verify this analysis? :)

ban2st avatar Jan 29 '24 15:01 ban2st

Oh, I indeed think that the _pickled_doctree_cache attribute should not be present in the serialized environment since whenever we are querying get_doctree we are getting a fresh doctree. Actually, what happens is that we are never loading the 'latest' pickled file (if you print what is being deserialized you'll just after check_dependents, you'll see that the documents are not those you'd expect).

picnixz avatar Feb 04 '24 13:02 picnixz