lark icon indicating copy to clipboard operation
lark copied to clipboard

HTML documentation is not built reproducibly (Sphinx)

Open wiene opened this issue 3 years ago • 10 comments

The present sphinx configuration leads to sources being included in the html build inside the _sources directory (because html_copy_source is set to true by default). Inside this directory the file _sources/examples/index.rst.txt contains the build path, e. g. for one of my builds:

[...]
    :download:`Download all examples in Python source code: examples_python.zip <//build/python-lark-utTVdH/python-lark-0.10.0/docs/examples/examples_python.zip>`
[...]
    :download:`Download all examples in Jupyter notebooks: examples_jupyter.zip <//build/python-lark-utTVdH/python-lark-0.10.0/docs/examples/examples_jupyter.zip>`
[...]

This renders the HTML documentation build process unreproducible (see https://reproducible-builds.org).

Is there a particular reason why you include the sources in the html build? If the only reason is that this behaviour is the default, one possible way to fix this would be setting

html_copy_source = False

in docs/conf.py.

wiene avatar Oct 16 '20 17:10 wiene

Why does it make it unreproducible? (And the website you linked is not that relevant for this situation from what I see)

MegaIng avatar Oct 16 '20 19:10 MegaIng

And also considering that sphinx as a warning about deactivating it I am not sure that this by itself is a good enough reason to change it.

MegaIng avatar Oct 16 '20 19:10 MegaIng

Why does it make it unreproducible?

Because due to the included build path the file contents is different for each build if the build is done in a schroot session which is common practice for packaging purposes to ensure a clean build environment.

The HTML documentation would also differ if you build it in your home directory and if I do it in my home directory (provided our home directories use different paths).

More information on this type of issue is available on https://tests.reproducible-builds.org/debian/issues/unstable/captures_build_path_issue.html.

Debian checks all packages (including those containing documentation) for reproducibility (see e. g. https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/python-lark.html).

And also considering that sphinx as a warning about deactivating it I am not sure that this by itself is a good enough reason to change it.

Thanks for the pointer. I was not aware of this and it seems that this warning is gone for the most recent version of Sphinx: https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-html_copy_source

Unfortunately I have no idea why this is the case (has it become irrelevant for the most recent release?).

wiene avatar Oct 16 '20 20:10 wiene

I would honestly talk to sphinx and look at what they have to say about this. Or did you find somewhere that the correct solution to this would be to deactivate this option? This is not the first problem in sphinx in this regard, so I would suggest asking them.

MegaIng avatar Oct 16 '20 22:10 MegaIng

And it might be worth mentioning @chsasank, as he is the one who contributed most (all?) of the sphinx setup.

MegaIng avatar Oct 16 '20 22:10 MegaIng

Or did you find somewhere that the correct solution to this would be to deactivate this option?

I consider my suggestion a hack which avoids the problem rather than a proper solution. It would definitely be helpful to get some input on this by a sphinx expert.

wiene avatar Oct 17 '20 20:10 wiene

This is sphinx-gallery's 'problem'. Not sphinx directly. Lemme know if I get this correct: this particular random string - python-lark-utTVdH in the generated rst causes reproducability issues.

If you want, you can just disable those two lines by doing this: https://sphinx-gallery.github.io/stable/configuration.html#disabling-download-button-of-all-scripts

chsasank avatar Jan 13 '21 03:01 chsasank

Lemme know if I get this correct: this particular random string - python-lark-utTVdH in the generated rst causes reproducability issues.

It is not the random string alone that causes the issue but the fact that the build path is included at all. If a user bob builds the docs in his home dir, the build path might be /home/bob/lark. If a user alice builds it, the path might be /home/alice/build. Even though there is no randomness involved in these examples, the built docs will differ.

wiene avatar Jan 13 '21 07:01 wiene

You can dockerize the build so that it's completely reproducible.

chsasank avatar Jan 20 '21 05:01 chsasank

You can dockerize the build so that it's completely reproducible.

@chsasank: Could you please elaborate why you think this solves the issue?

wiene avatar Jan 23 '21 10:01 wiene