sphinx
sphinx copied to clipboard
Parallel post transform including write_doc_serialized
Subject: Enable parallel post transformations including doctree-resolved
and write_doc_serialized
.
This is an extended version of PR https://github.com/sphinx-doc/sphinx/pull/11733. It has basically the same purpose. Sorry for opening a second PR, however I need to run the CI and also want to ask for maintainers' opinion on the work. Maybe there is a show stopper that I do not see yet.
Feature or Bugfix
- Feature
Purpose
The current parallel writing approach is quite slow for compute intensive extensions in doctree-resolved
.
The event is part of the post transformation call chain. This PR shifts that workload to the parallel subprocesses.
To correctly deal with updated doctrees in post-transform, also the step write_doc_serialized
runs in parallel.
Detail
For extensions like Sphinx-Needs this speeds up the build time by at least a factor of 2 for big projects. The new implementation also leads to more processes running in parallel as the serial post transformation of chunks in the main process blocks their parallel execution, so a bottleneck gets resolved.
All processes defined by --jobs
(minus 1, for the main process) now start at once when writing parallel.
The main difference between https://github.com/sphinx-doc/sphinx/pull/11733 and this PR is how write_doc_serialized
is called in case all extensions support parallel post transforming. Here is a summary for master
branch, the other PR and this PR:
-
master
- main process: post-transform including doctree-resolved
- main process: write_doc_serialized
- sub processes: write_doc
-
https://github.com/sphinx-doc/sphinx/pull/11733
- main process: write_doc_serialized
- sub process: post-transform including doctree-resolved
- sub process: write_doc
After my investigation, I now think this approach is broken as it changes the logic. write_doc_serialized does 2 things for the html builder: image handling and building the search index. Both need information generated in post-transform.
-
this PR
- sub process: post-transform including doctree-resolved
- sub process: write_doc_serialized
- sub processes: write_doc
- sub process: return relevant information from write_doc_serialized
- main process: merge back relevant information from write_doc_serialized
I think this is the correct approach. The function name write_doc_serialized is not optimal anymore, as it may run in parallel. On the long run I think Sphinx should change the contract with extensions so all of them must support parallel reading and writing.
All Sphinx test cases are green on my local machine. When activating parallel builds for all test cases, only 2 fail:
FAILED tests/test_domain_std.py::test_productionlist - AssertionError: assert 1 == 2
FAILED tests/test_ext_inheritance_diagram.py::test_inheritance_diagram - KeyError: 'basic_diagram'
But they also fail on master
, so I think the problems existed before.
I also ran all test cases of a new Sphinx-Needs branch that supports the latest Sphinx version against this PR. All are successful, also when run in parallel mode.
Performance
In terms of performance, this PR improves build times for bigger Sphinx projects that have expensive doctree-resolved
functions significantly by at least a factor of 2.
For projects that have cheap doctree-resolved
functions, the performance maybe a bit lower than master
branch, due to back-merging of artifacts of write_doc_serialized
. I measured a runtime increase of 2% for a 1000 pages project.
Memory
I haven't done a larger analysis besides my local XFCE task manager. Due to the fact that more processes run in parallel at a certain point in time, the memory consumption with the same core count can be higher than for the master
branch.
Risk evaluation
The new parallel post-transformation is activated using a new global configuration variable enable_parallel_post_transform
which is False
by default. The feature is also marked as experimental.
Only projects/builds that know all extensions support parallel post transformation for all extensions should activate it.
As the behavior for enable_parallel_post_transform=False
is not changed, the risk of this PR is low.
Conclusion As a conclusion, I think this PR is a candidate for merging as it behaves correctly in terms of application logic while https://github.com/sphinx-doc/sphinx/pull/11733 should be closed unmerged after confirmation of my assessment.
Any feedback is very welcome!
Quick comment (still busy I am):
- I would prefer to have this feature disabled by default and possibly enabled in Sphinx 9.x (have at least 1 major version for upgrading).
- With the feature disabled by default, the PR could be smaller. And there would be less risks (I clearly don't want to risk breaking projects with the latest Sphinx version).
I have no time to contribute to Sphinx until February so I can only comment things here and there.
I think it's a good approach though. However we need to carefully review what happens and mark the feature as experimental imo.
I like the PR and also understand the concerns of @picnixz.
For me, Sphinx should get a configuration variable, like parallel_post_transform
.
If this is True
the parallel execution of this PR gets activated.
By default it is currently False
, with the plan to set it to True
with version 9.x
With Sphinx 8.x
maybe a Deprecation warning can be shown, that with version 9.x
an extension needs to provide parallel_post_transform_safe
. If this is not the case, it will be set to True
for the specific version (from 9.x
on.)
But I would keep the already provided changes for the internal extensions, as @ubmarco has already tested all of them and he knows that setting parallel_post_transform_safe
to True
is safe for them.
Please keep in mind that 54 of the 61 files just got this line as change 'parallel_post_transform_safe': True,
.
So the PR touches 2-3 files with more complex stuff only.
For me, Sphinx should get a configuration variable, like parallel_post_transform. If this is True the parallel execution of this PR gets activated. By default it is currently False, with the plan to set it to True with version 9.x
That's a good idea.
has already tested all of them and he knows that setting parallel_post_transform_safe to True is safe for them.
While tests show indeed that the built-in extensions are safe, I don't think we have a 100% coverage so there might be corner cases where we could be surprised. Don't get me wrong here, I think it's fine if we enable this for the default extensions. However, I cannot assure you that we won't break any existing projects that we are not aware of (especially those that automatically update to the latest version of their dependencies).
With the configuration value, we would also reduce the lines that this PR touches (although it's not much).
Relevant issues: https://github.com/sphinx-doc/sphinx/issues/10779 and https://github.com/sphinx-doc/sphinx/issues/11448
@picnixz I refactored the PR as discussed.
- There is a new global configuration parameter
enable_parallel_post_transform
which isFalse
by default - The touched files went down
- Docs were added with the new feature marked as experimental
- CI is still happy
The only thing missing is a test case from my point of view.
Please ensure that this does not make race-conditions worse - see issue #2946
#2946 is for me not fulfilling the definition of "race -conditions" because Sphinx gets called twice from external, with the same config of doctree-location. Even MS Word and others have the same problem if the tools get called twice for the same file. Instance "A" is normally not informed by instance "B" about a documentation change. And "A" would overwrite the results from "B", in most cases.
For me, the ticket asks more for a database behavior, so to support concurrent data-changes. Nothing that Sphinx needs to provide (imho).
Also, the workaround is simple, without losing performance: Let the first builder use -j auto
to get the build done on 100% CPU usage. Then invoke the second builder, which will reuse the doctress from the first run and nearly calculates the new output format only.
Internal race-conditions should not be a thing in Sphinx, as long as the code dealing with one file is called serial. What is still the case here. Only merging data back from concurrent doctree-handlers to the main ENV object is something you need to care about and avoid "race-conditions", but this is also handled by Sphinx and some extra small voodoo in this PR.
However, in my understanding, this PR does not make it worse, as it is already. :)
There are also issues with sphinx-build -j auto
(e.g. in qemu, kernel-docs)
I refactored the PR as discussed.
Thanks for that. Now I'm only available for a few days before going offline until mid February. Today I cleaned up what I needed to do but there are a lot of ongoing issues that we need to address (those related to test failures for instance).
Also, on the organisational side we are doing something internally so we won't be doing anything until everything is arranged and done (probably early January). As such, you'll need to wait again (sorry for that).
@ubmarco I don't seem to be able to push to this branch, would you be able to check please?
https://github.com/AA-Turner/sphinx/tree/mh-parallel-post-transform-parallel-write-doc-serial
A
@ubmarco I don't seem to be able to push to this branch, would you be able to check please?
https://github.com/AA-Turner/sphinx/tree/mh-parallel-post-transform-parallel-write-doc-serial
A
I think the problem is that I created the fork on our organization and not on my user. According to the docs allowing edits for maintainers only works for forks in personal accounts. Will keep that in mind for the future.
I rebased my branch on current master (prefer this for cleaner history) and added your style
commit.
I've added a new commit (https://github.com/AA-Turner/sphinx/commit/14899132a113be09c166acde7440a1ef4c69888a) in https://github.com/AA-Turner/sphinx/tree/mh-parallel-post-transform-parallel-write-doc-serial.
I'm hesitant about adding new public API (merge_builder_post_transform
/ post_transform_merge_attr
), as we'll need to support it for the long term. But I don't have any proposals on an alternate design at the moment, so it might be the best option.
A
I integrated your Refactoring
commit.
I've tried the sphinx version of this PR in large projects (3000 - 30.000 rst files), and it really speeds up things. I'd be happy to see this in the next sphinx release!
@arwedus Could you also give me the runtimes when
- this PR + #11939 are enabled,
- this PR is enabled,
- #11939 is enabled,
- none of the two PRs are enabled?
I don't have a huge project in production to test against and thus I'd like some feedback from projects as yours.
Oh my, these latest changes are not supposed to be pushed to this PR, it's for my local testing. Sorry, will revert them.
I reverted the unintended changes of my last commit and rebased the PR on latest master.
@arwedus
Unfortunately, I can't test this right now, because if I install sphinx with
pip install git+https://github.com/useblocks/sphinx.git@69a77c7
I get this error by sphinx:
sphinx.errors.SphinxWarning: Failed to convert typing.Any to a set or tuple
I've tried with verbose but the log is not super useful:
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/config.py", line 514, in _validate_valid_types
valid_types = tuple(valid_types)
File "/usr/lib/python3.10/typing.py", line 312, in inner
return func(*args, **kwds)
File "/usr/lib/python3.10/typing.py", line 403, in __getitem__
return self._getitem(self, parameters)
File "/usr/lib/python3.10/typing.py", line 425, in Any
raise TypeError(f"{self} is not subscriptable")
TypeError: typing.Any is not subscriptable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/cmd/build.py", line 337, in build_main
app = Sphinx(args.sourcedir, args.confdir, args.outputdir,
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/application.py", line 229, in __init__
self.setup_extension(extension)
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/application.py", line 402, in setup_extension
self.registry.load_extension(self, extname)
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/registry.py", line 466, in load_extension
metadata = setup(app)
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/myst_parser/__init__.py", line 12, in setup
setup_sphinx(app, load_parser=True)
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/myst_parser/sphinx_ext/main.py", line 59, in setup_sphinx
app.add_config_value(f"myst_{name}", default, "env", types=Any)
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/application.py", line 541, in add_config_value
self.config.add(name, default, rebuild, types)
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/config.py", line 434, in add
valid_types = _validate_valid_types(types)
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/config.py", line 516, in _validate_valid_types
logger.warning(__('Failed to convert %r to a set or tuple'), valid_types)
File "/usr/lib/python3.10/logging/__init__.py", line 1847, in warning
self.log(WARNING, msg, *args, **kwargs)
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/util/logging.py", line 131, in log
super().log(level, msg, *args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1879, in log
self.logger.log(level, msg, *args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1547, in log
self._log(level, msg, args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
self.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
self.callHandlers(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
hdlr.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 964, in handle
rv = self.filter(record)
File "/usr/lib/python3.10/logging/__init__.py", line 821, in filter
result = f.filter(record)
File "/home/.../.sphinx-venv/lib/python3.10/site-packages/sphinx/util/logging.py", line 432, in filter
raise exc
sphinx.errors.SphinxWarning: Failed to convert typing.Any to a set or tuple
@chrisjsewell might be interested in this, because it seems to be caused by myst-parser 2.0
.
Breathe is also an extension I had to deactivate. But if I deactivate myst-parser
, I don't have a very large project to test any more 😆
@picnixz : With my patches to both myst-parser and breathe, I managed to run the build on my rather big and complicated sphinx project.
It stands at 721 source files, contains lots of sphinx-needs, intersphinx links, plantUML, dynamic functions, sphinx-autoapi and a dozen other extensions.
Build times for clean build (deleted build director):
sphinx 7.3, parallel_post_transform = true, python 3.10:
real 4m53,530s user 21m12,846s sys 1m15,931s
sphinx 7.2.6, python 3.10:
real 7m54,836s user 14m34,006s sys 1m20,964s
I tested on a beefy machine, where any improvement in parallelism should bring down the wall time: 36-core Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz, 128 GB RAM.
Consecutive runs gave similar results. The build with sphinx 7.2.6 threw some unexplainable errors, so the numbers for user time might be a bit off. The results clearly show improved parallelism. I wonder about the increase in total CPU time ("user"), might be due to issues with the sphinx 7.2.6 build.
@arwedus Thank you very much for the timings. If possible, could you also time your project without parallel_post_transform
(i.e., HEAD vs 7.2.6 release) for an incremental build time (an perhaps give me the numbers for a fresh build as well). I would like to know the direct consequences of https://github.com/sphinx-doc/sphinx/pull/11939 (if possible, could you report them on https://github.com/sphinx-doc/sphinx/pull/11939 instead of this PR?)