docstring_parser
docstring_parser copied to clipboard
Support attribute docstrings
The approved PEP 257 mentions the so called "attribute docstrings", which are string literals in the line after where an attribute is defined. These kind of docstrings are supported by a several packages, for example sphinx's autodoc extension. It would be nice if docstring_parser also supports them. Having the description right next to each attribute leads to better looking code in particular when there are many attributes defined. Take for instance the following dataclass:
from dataclasses import dataclass
@dataclass
class MyDataClass:
"""Short description for MyDataClass"""
attr_one: int = 1
"""Description for attr_one"""
attr_two: bool = False
"""Description for attr_two"""
attr_three: str = ''
...
The support for this could be implemented by extending https://github.com/rr-/docstring_parser/blob/51c2b6a7d347b42c8564986d89208af9251d5c5f/docstring_parser/parser.py#L21
Instead of only accepting text: str as input it could accept text_or_component: Union[str, type, types.ModuleType] (type for giving a class as input). When the input is a class or a module it would first look for __doc__ and then could try to parse the attribute docstrings.
@rr- is this something that would be accepted in docstring_parser? Should the implementation be different in some way?
Why not pass t.__doc__ to parse directly?
Why not pass
t.__doc__toparsedirectly?
Attribute docstrings don't have a __doc__. To parse the MyDataClass example would require to get the short description from the class' __doc__ and then get the attribute descriptions by parsing the source code. This could be by using the ast module.
I think I'd need to see a MVP to comment whether we should support this. (probably yes)
My preliminary intuition is that I wouldn't want to extend parse's contract by having it support non-strings, and instead offer a new method that uses parse() underneath. But this should clarify with a MVP.
My preliminary intuition is that I wouldn't want to extend
parse's contract by having it support non-strings, and instead offer a new method that usesparse()underneath. But this should clarify with a MVP.
How about a new function like:
def parse_from_component(component: Union[type, types.ModuleType], style: DocstringStyle = DocstringStyle.AUTO) -> Docstring:
I am not sure what you mean by an MVP. I would only invest time in implementing if I known that it will be accepted in this project. I suppose this can be decided without implementation. The main point is that for:
@dataclass
class MyDataClass1:
"""Short description for MyDataClass
Args:
attr_one: Description for attr_one
attr_two: Description for attr_two
"""
attr_one: int = 1
attr_two: bool = False
@dataclass
class MyDataClass2:
"""Short description for MyDataClass"""
attr_one: int = 1
"""Description for attr_one"""
attr_two: bool = False
"""Description for attr_two"""
the following parsings would all give the same result:
parse(MyDataClass1.__doc__)
parse_from_component(MyDataClass1)
parse_from_component(MyDataClass2)
You can find more context in https://github.com/omni-us/jsonargparse/issues/150.
Fair, but how would you extract the docstring for attr_one and attr_two?
My initial idea would be to use inspect.getsource, then ast.parse and implement an ast.NodeVisitor to collect all attributes that have docstrings.
The following is a possible implementation to extract docstrings for class attributes:
import ast
import inspect
def ast_is_literal_str(node):
return isinstance(node, ast.Expr) and isinstance(node.value, ast.Constant) and isinstance(node.value.value, str)
def ast_get_attribute_name(node):
if isinstance(node, (ast.Assign, ast.AnnAssign)):
target = node.targets[0] if isinstance(node, ast.Assign) else node.target
if isinstance(target, ast.Name):
return target.id
class AttributeDocstrings(ast.NodeVisitor):
def visit(self, node):
if self.prev_attr_name and ast_is_literal_str(node):
self.attr_docs[self.prev_attr_name] = node.value.value
self.prev_attr_name = ast_get_attribute_name(node)
if isinstance(node, ast.ClassDef):
self.generic_visit(node)
def get_attr_docs(self, component: type):
self.attr_docs = {}
self.prev_attr_name = None
source = inspect.getsource(component)
tree = ast.parse(source)
if isinstance(tree, ast.Module) and isinstance(tree.body[0], ast.ClassDef):
self.visit(tree.body[0])
return self.attr_docs
visitor = AttributeDocstrings()
attr_docs = visitor.get_attr_docs(MyDataClass2)
print(attr_docs)
@rr- how does it look?
What would be a good way to introduce this in the package? I would propose to add a parse_from_component function to parser.py and create a new py file for source code (ast) parsing.
It looks good. Regarding how to introduce this feature, I don't have a better suggestion than this – introduce a top level function similar to parse that does the thing showcased above. Since this feels to be very loosely compatible with the current API, I'd mention in the documentation of the new function that it should be considered unstable and may be a subject to changes in future versions. I'd put the implementation details in util.py and offer a facade for it in parser.py to keep that file small. I'd create tests for it in test_util.py. Should similar requirements arise in the future (perhaps being able to change the attribute docstrings and serialize the AST back to a string?) I'd wage refactoring this interface, but not now.
I created a pull request #72.
I'd put the implementation details in
util.pyand offer a facade for it inparser.pyto keep that file small.
Implementation details in util.py is not possible because it imports from parser.py. The details are in attrdoc.py.
Since this feels to be very loosely compatible with the current API, I'd mention in the documentation of the new function that it should be considered unstable and may be a subject to changes in future versions.
What changes are you thinking of? Marking it as unstable dissuades from using it until it is stable. I think it is better to not rush in merging and only making it part of a release when it is clear that it is stable API. After merging, future changes to the API should be backward compatible.