sphinx
sphinx copied to clipboard
Incremental/Distributed sphinx-build
This is a half-baked feature request, but I'm pretty sure this is too big for StackOverflow and I've done enough poking around to be relatively confident that this isn't currently possible (or at least, well supported) today. If it is possible then I apologize and would appreciate any pointers to documentation.
Is your feature request related to a problem? Please describe.
The primary problem is that sphinx is it's own build system, which makes it difficult to integrate into other build systems in a way which is performant, idiomatic, and scalable. Of particular interest is the ability to configure a documentation build in bazel. A naive configuration would have a single build step calling sphinx-build which takes as input all the source files and produces, as output, some artifact representing the generated documentation site (e.g. an output directory, or a tarfile to keep things tidy). However this is problematic for the following reasons:
- sandboxing will subvert sphinx's caching (un-declared outputs are discarded and build steps must be hermetic, they may not take as input their own outputs).
- sphinx's intermediate files will not be cached via bazel's local or remote caching
- bazel manages parallelization of jobs so it can be problematic if the job itself is also parallelized (e.g. with
-j) - a monolithic job is somewhat unfriendly to bazel remote execution since a large number of input artifacts need to be transferred to the remote executor
Describe the solution you'd like Ideally, it would be possible to execute sphinx in "stages" according to the different phases that it works through. Additionally, the build system integrator/author would need to know (deterministically) what the inputs and outputs are for each step.
Describe alternatives you've considered I think the alternative is the "monolothic" build step outlined in the problem description.
Additional context In addition to exposing the different steps of the build, I think a big challenge to designing a performant build system will be managing the indexes and other shared state. As I have understood it by poking around in the code, a lot of stuff get's pulled together into the environment during events 9-12 in the linked document. This makes a pretty significant build bottleneck because all input .rst files become a dependency of all output files. Ideally there would be some mechanism for distributed indexes or a distributed environment would allow the build system avoid this bottleneck. Additionally, some features (e.g. global section numbering) would not be possible without a single shared environment that knows about the entire document structure. The best case scenario is that each output document transitively depends only on the input documents that it must (e.g. the documents that create that page directly or the documents that define the references which are utilized in that output document). This is probably all out-of-scope for this request, but just a note here. If the build system could at least separate out the different phases of the build, then at least bazel can cache sphinx's intermediate files and better manage parallelism.