pdoc
pdoc copied to clipboard
Module-level docstring section name disrupts parsing of class code
Expected Behavior
When dealing with docstrings at the module level and the class level, pdoc3 should include the valid source code for the class in the rendered HTML, regardless of what is in module-level docstrings.
Actual Behavior
Including a section name in the module-level docstring that matches the name of a class to be documented causes pdoc3 to fail to parse the source and as a result fail to include it in the rendered HTML.
Steps to Reproduce
- Create file
min_working_example.py:
"""Module docstring.
Some additional text.
class ABC
---------
This header creates the problem.
class `DEF`
-----------
This header does not create a problem.
"""
from __future__ import absolute_import
from __future__ import unicode_literals
__all__ = ["ABC", "DEF"]
class ABC:
"""ABC class docstring."""
def __init__(self):
"""Construct object."""
self.name = "ABC"
class DEF:
"""DEF class docstring."""
def __init__(self):
"""Construct object."""
self.name = "DEF"
- Run
pdoc3 --html min_working_examplewhich yields the following error:
~/.local/lib/python3.6/site-packages/pdoc/__init__.py:227: UserWarning: Couldn't get/parse source of '<Class 'min_working_example.ABC'>'
warn("Couldn't get/parse source of '{!r}'".format(doc_obj))
Additional info
- pdoc version: 0.7.2
- ~~May be related to #134 and #106~~
So, I did a bit of additional digging and I have narrowed down the point at which this error occurs to the extraction of attributes from the module.
abc_class_obj = getattr(self.obj, "ABC")
def_class_obj = getattr(self.obj, "DEF")
both give the correct class objects (correct in the sense that I can use them to instantiate ABC and DEF objects and those have the right member values and such). However, after this extraction of attributes, inspect.getsource(abc_class_obj) gives the incorrect source
class ABC
(presumably somehow extracted from the module-level docstring), while getting the source of def_class_obj seems to work fine.
Basically, to my eyes, it looks like there is either a problem with the inspect library or the getattr (the former seems more likely). Any additional debugging ideas would certainly be appreciated though.
One more update: so the problem lies in the implementation of inspect.getsourcelines, which is called by inspect.getsource. Specifically here, they employ some heuristics to quickly return the source lines. In our example, this heuristic gives the wrong result because it interprets the class ABC in the docstring as the top-level definition of the class despite being in a string. I will maybe open an issue on the Python issue tracker, but I suspect it would take a while for some maintainers to get to it (this issue with the inspect library has been open for 4 years).
As for potential workarounds, I briefly looked at ast to see if it offers something one could use to do this better. For top-level classes, parsing the module and finding the class in the list of AST children and using lineno and the next element's lineno would probably work to find the source. This works for Python 3.5+ (3.8 offers the convenient member end_lineno, but this is probably easy to implement manually). I don't know how this works for nested classes or basically anything else.
Great detective work! Long turnaround time for bugs in mature projects shouldn't deter you from filing them — the least they give clues at points of broader rewrites.
And CPython does accept PRs on GitHub ...
I don't know how to work around this in pdoc3, though. Bypassing inspect is not a likable option.