numpydoc
numpydoc copied to clipboard
Numpydoc does not play nice with 'latex' builder wrt bibliographic references
The way in which numpydoc manages bibliographic references is causing me some problems in conjunction with the latex sphinx builder.
-
Whenever I have an
automodule
directive, all the docstrings of all the functions in the module seem to be parsed, even if the automodule directive does not have the option to document its members set. This causes the reference counter to be incremented every time it finds a bibliographic entry even if that entry is eventually not used. Eventually one gets bibliographic references with 'holes' in their numeric sequence. For instance, one may have [R5] and [R9] but not [R6], [R7], [R8]. This may go unnoticed in html output, but does not in latex output that collects all the references together. -
Because the source files are read in arbitrary order by sphinx, the reference counter is also incremented in arbitrary order. This means that in the final document [R5] can come before [R1]. While this may go almost unnoticed in html output, it is extremely bad in the latex output that collects all the references together.
-
If I have a 'Reference' section in the documentation of a function, that section gets emptied when using the latex builder, because the latex buider collects all the references in a reference section at the end of the document. This leaves empty reference sections.
Thank you for clarifying these issues, Sergio. Are you interested in working together on a PR to fix this?
As a matter of fact, I am on my first usage of sphinx, so I do not know exactly how the PR process works. Most important, I do not feel very qualified, since my understanding of the internal working of sphinx is still modest (I am in the phase where I can much better ask questions than trying to answer them!). In any case, if I can be of any help, I would be glad too.
Re-thinking about the matter, I believe that what numpydoc actually does is not that bad, since it makes the references unique, which is already an important thing.
What I think would be useful is the following (in the following for clarity I wlll call 'bibitems' the expanded bibliographic entries and 'citations' the references to them):
- A way to configure sphinx to 'remove' and 'capture' into the environment all the bibitems it finds. In fact, in large documents, one typically wants all the bibitems to go into a single ad hoc section at the end of the document, but the very idea of autodoc is that bibitems may come dispersed into docstrings. Furthermore, removing and capturing the bibitems is in any case needed for the latex builder.
With respect to point 1, note that in docstring there can be a section dedicated to containing the bibitems. The numpy guide to docstrings recommends using the "References" section header for them In case the bibitems are removed, the whole section should be removed. The current state of the art is that the 'latex' builder can leave empty "Reference" sections when using autodoc.
- An
autobibliography
directive should be offered. This should be capable of creating a Bibliograpy section with all the captured bibitems and suitable for working with all the builders, including the Latex one. The directive should cause sphinx to emit all the captured bibitems, possibly sorting them according to the citation order.
With respect to point 2, the following things should be considered. Sinceink bibitems may come from docstrings, there can be duplications. For instance, the same bibitem may appear in the documentation of function func1 and function func2. The autobibliography directive should recognize duplicated bibitems and emit them only once.
I was thinking of building this behavior in an ad hoc extension. However, my understanding of sphinx internals is still a little limited for that. Unfortunately, there aren't these many docstrings for a project that includes an autodoc extension :-). Particularly, what I find problematic is that I do not see standard 'events' in builders signalling important phases of the build process to connect to.
What numpydoc can easily do is only rewriting the text of the docstrings.
Within this constraint, you can try to see if it's possible to get some compromise solution --- you don't need to know how to program Sphinx extensions to try to solve this, just write out a *.py module containing some functions with docstrings, use .. autofunction::
directives to include the text to your Sphinx document, and then try to find a way to write references sections in the docstrings so that the output from Sphinx is sensible.
As to your original points:
(1) Probably not fixable, except maybe if it's just the signature mangling that gets invoked, or it's due to something in numpydoc parsing all sub-object docstrings.
(2) Might not be fixable, except maybe via some natbib sort option that sorts according to index key.
(3) Currently the produced references section in latex is not empty, but contains (only) links to the bibliography; this works at least for Scipy docs.