lark
lark copied to clipboard
HTML documentation is not built reproducibly (Sphinx)
The present sphinx configuration leads to sources being included in the html build inside the _sources
directory (because html_copy_source
is set to true
by default). Inside this directory the file _sources/examples/index.rst.txt
contains the build path, e. g. for one of my builds:
[...]
:download:`Download all examples in Python source code: examples_python.zip <//build/python-lark-utTVdH/python-lark-0.10.0/docs/examples/examples_python.zip>`
[...]
:download:`Download all examples in Jupyter notebooks: examples_jupyter.zip <//build/python-lark-utTVdH/python-lark-0.10.0/docs/examples/examples_jupyter.zip>`
[...]
This renders the HTML documentation build process unreproducible (see https://reproducible-builds.org).
Is there a particular reason why you include the sources in the html build? If the only reason is that this behaviour is the default, one possible way to fix this would be setting
html_copy_source = False
in docs/conf.py
.
Why does it make it unreproducible? (And the website you linked is not that relevant for this situation from what I see)
And also considering that sphinx as a warning about deactivating it I am not sure that this by itself is a good enough reason to change it.
Why does it make it unreproducible?
Because due to the included build path the file contents is different for each build if the build is done in a schroot session which is common practice for packaging purposes to ensure a clean build environment.
The HTML documentation would also differ if you build it in your home directory and if I do it in my home directory (provided our home directories use different paths).
More information on this type of issue is available on https://tests.reproducible-builds.org/debian/issues/unstable/captures_build_path_issue.html.
Debian checks all packages (including those containing documentation) for reproducibility (see e. g. https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/python-lark.html).
And also considering that sphinx as a warning about deactivating it I am not sure that this by itself is a good enough reason to change it.
Thanks for the pointer. I was not aware of this and it seems that this warning is gone for the most recent version of Sphinx: https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-html_copy_source
Unfortunately I have no idea why this is the case (has it become irrelevant for the most recent release?).
I would honestly talk to sphinx and look at what they have to say about this. Or did you find somewhere that the correct solution to this would be to deactivate this option? This is not the first problem in sphinx in this regard, so I would suggest asking them.
And it might be worth mentioning @chsasank, as he is the one who contributed most (all?) of the sphinx setup.
Or did you find somewhere that the correct solution to this would be to deactivate this option?
I consider my suggestion a hack which avoids the problem rather than a proper solution. It would definitely be helpful to get some input on this by a sphinx expert.
This is sphinx-gallery's 'problem'. Not sphinx directly. Lemme know if I get this correct: this particular random string - python-lark-utTVdH
in the generated rst causes reproducability issues.
If you want, you can just disable those two lines by doing this: https://sphinx-gallery.github.io/stable/configuration.html#disabling-download-button-of-all-scripts
Lemme know if I get this correct: this particular random string -
python-lark-utTVdH
in the generated rst causes reproducability issues.
It is not the random string alone that causes the issue but the fact that the build path is included at all. If a user bob
builds the docs in his home dir, the build path might be /home/bob/lark
. If a user alice
builds it, the path might be /home/alice/build
. Even though there is no randomness involved in these examples, the built docs will differ.
You can dockerize the build so that it's completely reproducible.
You can dockerize the build so that it's completely reproducible.
@chsasank: Could you please elaborate why you think this solves the issue?