docstring_parser Support attribute docstrings

The approved PEP 257 mentions the so called "attribute docstrings", which are string literals in the line after where an attribute is defined. These kind of docstrings are supported by a several packages, for example sphinx's autodoc extension. It would be nice if docstring_parser also supports them. Having the description right next to each attribute leads to better looking code in particular when there are many attributes defined. Take for instance the following dataclass:

from dataclasses import dataclass

@dataclass
class MyDataClass:
    """Short description for MyDataClass"""

    attr_one: int = 1
    """Description for attr_one"""

    attr_two: bool = False
    """Description for attr_two"""

    attr_three: str = ''
    ...

The support for this could be implemented by extending https://github.com/rr-/docstring_parser/blob/51c2b6a7d347b42c8564986d89208af9251d5c5f/docstring_parser/parser.py#L21

Instead of only accepting text: str as input it could accept text_or_component: Union[str, type, types.ModuleType] (type for giving a class as input). When the input is a class or a module it would first look for __doc__ and then could try to parse the attribute docstrings.

Jul 26 '22 20:07 mauvilsa

@rr- is this something that would be accepted in docstring_parser? Should the implementation be different in some way?

Jul 29 '22 06:07 mauvilsa

Why not pass t.__doc__ to parse directly?

Jul 29 '22 08:07 rr-

Why not pass t.__doc__ to parse directly?

Attribute docstrings don't have a __doc__. To parse the MyDataClass example would require to get the short description from the class' __doc__ and then get the attribute descriptions by parsing the source code. This could be by using the ast module.

Jul 29 '22 09:07 mauvilsa

I think I'd need to see a MVP to comment whether we should support this. (probably yes)

My preliminary intuition is that I wouldn't want to extend parse's contract by having it support non-strings, and instead offer a new method that uses parse() underneath. But this should clarify with a MVP.

Jul 29 '22 10:07 rr-

My preliminary intuition is that I wouldn't want to extend parse's contract by having it support non-strings, and instead offer a new method that uses parse() underneath. But this should clarify with a MVP.

How about a new function like:

def parse_from_component(component: Union[type, types.ModuleType], style: DocstringStyle = DocstringStyle.AUTO) -> Docstring:

I am not sure what you mean by an MVP. I would only invest time in implementing if I known that it will be accepted in this project. I suppose this can be decided without implementation. The main point is that for:

@dataclass
class MyDataClass1:
    """Short description for MyDataClass

    Args:
        attr_one: Description for attr_one
        attr_two: Description for attr_two
    """

    attr_one: int = 1
    attr_two: bool = False


@dataclass
class MyDataClass2:
    """Short description for MyDataClass"""

    attr_one: int = 1
    """Description for attr_one"""

    attr_two: bool = False
    """Description for attr_two"""

the following parsings would all give the same result:

parse(MyDataClass1.__doc__)
parse_from_component(MyDataClass1)
parse_from_component(MyDataClass2)

You can find more context in https://github.com/omni-us/jsonargparse/issues/150.

Jul 30 '22 06:07 mauvilsa

Fair, but how would you extract the docstring for attr_one and attr_two?

Jul 30 '22 07:07 rr-

My initial idea would be to use inspect.getsource, then ast.parse and implement an ast.NodeVisitor to collect all attributes that have docstrings.

Jul 30 '22 11:07 mauvilsa

The following is a possible implementation to extract docstrings for class attributes:

import ast
import inspect

def ast_is_literal_str(node):
    return isinstance(node, ast.Expr) and isinstance(node.value, ast.Constant) and isinstance(node.value.value, str)

def ast_get_attribute_name(node):
    if isinstance(node, (ast.Assign, ast.AnnAssign)):
        target = node.targets[0] if isinstance(node, ast.Assign) else node.target
        if isinstance(target, ast.Name):
            return target.id

class AttributeDocstrings(ast.NodeVisitor):
    def visit(self, node):
        if self.prev_attr_name and ast_is_literal_str(node):
            self.attr_docs[self.prev_attr_name] = node.value.value
        self.prev_attr_name = ast_get_attribute_name(node)
        if isinstance(node, ast.ClassDef):
            self.generic_visit(node)

    def get_attr_docs(self, component: type):
        self.attr_docs = {}
        self.prev_attr_name = None
        source = inspect.getsource(component)
        tree = ast.parse(source)
        if isinstance(tree, ast.Module) and isinstance(tree.body[0], ast.ClassDef):
            self.visit(tree.body[0])
        return self.attr_docs


visitor = AttributeDocstrings()
attr_docs = visitor.get_attr_docs(MyDataClass2)
print(attr_docs)

Aug 03 '22 04:08 mauvilsa

@rr- how does it look?

What would be a good way to introduce this in the package? I would propose to add a parse_from_component function to parser.py and create a new py file for source code (ast) parsing.

Aug 08 '22 07:08 mauvilsa

It looks good. Regarding how to introduce this feature, I don't have a better suggestion than this – introduce a top level function similar to parse that does the thing showcased above. Since this feels to be very loosely compatible with the current API, I'd mention in the documentation of the new function that it should be considered unstable and may be a subject to changes in future versions. I'd put the implementation details in util.py and offer a facade for it in parser.py to keep that file small. I'd create tests for it in test_util.py. Should similar requirements arise in the future (perhaps being able to change the attribute docstrings and serialize the AST back to a string?) I'd wage refactoring this interface, but not now.

Aug 08 '22 08:08 rr-

I created a pull request #72.

I'd put the implementation details in util.py and offer a facade for it in parser.py to keep that file small.

Implementation details in util.py is not possible because it imports from parser.py. The details are in attrdoc.py.

Since this feels to be very loosely compatible with the current API, I'd mention in the documentation of the new function that it should be considered unstable and may be a subject to changes in future versions.

What changes are you thinking of? Marking it as unstable dissuades from using it until it is stable. I think it is better to not rush in merging and only making it part of a release when it is clear that it is stable API. After merging, future changes to the API should be backward compatible.

Aug 23 '22 05:08 mauvilsa

docstring_parser docstring_parser copied to clipboard

Support attribute docstrings

docstring_parser
docstring_parser copied to clipboard