Style guide diverges from "Single-Source-of-Truth" regarding function parameter type documentation
In section https://numpydoc.readthedocs.io/en/latest/format.html#parameters (also applies to Returns, ...) the style guide defines that types should be added to each parameter description.
Nowadays, having a well designed Python typing system, documenting parameter types in docstrings is not only outdated, but will also cause code and documentation to diverge from the key principle of Single-Source-of-Truth.
F.i. the following method:
def my_shiny_method(num_a: float, count: int) -> list[float]:
"""
Create list containing some dummy numbers.
Parameters
----------
num_a
Base number to use for filling list.
count
Target list length.
Returns
-------
List filled of length `count` with floats.
"""
return [num_a / count for i in range(count)]
is fully defined, documented and typed, using the typing syntax as the Single-Source-of-Truth.
Requiring type definitions in the docstrings, f.i.
"""
Create list containing some dummy numbers.
Parameters
----------
num_a : float
Base number to use for filling list.
count : int
Target list length.
Returns
-------
list[float]
List filled of length `count` with floats.
"""
is not only superfluous, but also - in my experience - dangerous, because methods etc. tend to be updated over time. And method interfaces may change. In my experience, method interfaces - even in productive code - change more often than desired. And most developers - especially in productive environments - are pushed for time and thus tend to forget to change the irrelevant part - the duplicate types in the docstrings - resulting in something like the following:
def my_shiny_method(num_a: int, count: int) -> list[float]:
"""
Create list containing some dummy numbers.
Parameters
----------
num_a : float
Base number to use for filling list.
count : int
Target list length.
Returns
-------
list[float]
List filled of length `count` with floats.
"""
return tuple(num_a / count for i in range(count))
--> Now we have diverging typing information --> Most users will look at the help(method) to see what a method expects. This is, depending on the user's preference, misleading.
This becomes even more problematic when we look at stuff like Literals, which - in modern Python, should be typed
def foo(order: Literal["C", "F", "A"]) -> None: ...
whereas the style guide recommends something using a completely different syntax (set != Literal --> faulty):
order : {'C', 'F', 'A'}
Description of `order`.
Proposal
- Recommend to only use the Python typing system
- If there is a specific requirement of needing types in docstrings, the type syntax should be aligned with the core Python library
Advantages
- Single-Source-of-Truth!!!
- No risk of diverging type descriptions
- All type definitions may easily be checked by mypy or alike
- All type definitions may be style checked and reformatted by tools like ruff, black, ... Thus type information is less prone to syntax errors like
param: typeinstead ofparam : type - types from namespaces are resolved correctly
- One and only one typing syntax
- Doc generators like sphinx nowadays focus on Python typing anyways: no more ambiguous option handling
- less typing
- cleaner and simpler docstrings
Challenge
How to format Returns and Yields sections?
tl;dr - I'm -1 on removing parameter type descriptions from the standard. Projects that do not want to provide parameter type descriptions are already free to do so.
IMO type annotations and numpydoc parameter type descriptions fulfill different roles. Succinctly: type annotations are designed for machines, and parameter type descriptions are for humans. The formalization of typing information into a structured schema is not necessarily always the best fit for conveying information to humans. For simple cases (e.g. List[int]) there's no issue, but in practice the lack of formal, structured typing schema is an advantage in terms of conveying information to documentation readers. For example:
img : `array_like` with shape ``(W, H)`` or ``(W, H, C)``, where ``C`` is 1 or 3
Setting aside API design considerations, this parameter type description (arguably) conveys a lot of information in a relatively straightforward way. It would be possible to express this in terms of formal type annotations, but the result would arguably be less (human) readable and introduce indirection in the description (e.g. defining type(s) to explicitly capture what array_like means and the various shapes & channel interpretations).
So to summarize - IMO at least, documentation parameter type descriptions fulfill a different role than structured type annotations, so supporting both separately is worthwhile.
Other considerations
Now - let's say you disagree with the above take. The good news is, it'd be very straightforward to take formal type annotations for your project and inject them into the parameter type descriptions. There's no requirement that parameter type descriptions not be valid annotations. Even simpler - there's no requirement that one provide a parameter description at all! In other words, the numpydoc standard does not force users to provide parameter type descriptions; so if for your individual project you are concerned about annotation/doc synchronization and don't want to do any additional documenting/engineering - just go ahead and leave the doc parameter descriptions blank!
Injecting existing type annotations directly into docstrings is straightforward - the reverse, i.e. going from unstructured type descriptions to formal type annotations, is less so. Fortunately there exists tooling for those interested in pursuing this. Some IDE's, most notably PyCharm, have had this capability (i.e. performing static type analysis using numpydoc parameter type descriptions) for well over a decade (even predating PEP 484). I'm not sure if the code/grammars they use are open-source though. There's also docstub, which is developed to help address this exact issue.
Thanks for your reply @rossbar. I get your point and I fully understand the need for a human readable type description.
I think the style guide should at least highlight that the type information in docstrings is purely optional and for human readable type information (not adhering to any syntax) when the type annotations become too complex for a user to understand. Such that for simple types, it's recommended to omit the type description whereas for complex types or large unions, giving the human readable type description is encouraged.
Currently, the guide is slightly misleading and ambiguous, even when after having worked with it for much more than 5 years - and that's also what I see/hear from my dev team.
I personally go straight to the docstring Parameters/Returns/Yields as the single source of truth. Type annotations give the context to what the parameter is supposed to do.
Imho it's quite hard to say what a parameter is supposed to do from looking at the annotations unless for control arguments like boolean flags or axis definitions. I think as soon as there is data involed, let's say a list, an array or a pd.DataFrame, we can only say what's supposed to be done with a parameter from the full combination of information: function name, parameter names, parameter types and docstring.
Also if we have both, there is no single source of truth because we have two definitions unless we had a strict delineation between "docstring: human readable description" and "annotations: types supported by the code" with both not being interchangeable (obvious for docstring -> annotations but not vice-versa).
Again, the parameter type descriptions in the docstring are for humans... they can be whatever the docstring author thinks is most relevant for readers. Even in the case of simple types, it's still valuable to have the information contained in the docstring to provide context for the parameter description, at least IMO. This is obviously subjective, but I find
normalize : bool
Normalize output if `True`.
easier to digest than:
normalize :
Normalize output if `True`
The type description provides context for the parameter description. Obviously YMMV, and if you find this not to be the case, then again individual projects can choose to leave the parameter description blank.
While I definitely hear the "single source of truth" argument, I don't think it's an issue in practice. NumPy has had annotations for several years (and has always had numpydoc-style docstrings) and I don't recall any issues about inconsistencies between docs and annotations; at least certainly not a high volume of issues.
Also, bear in mind that many projects don't make use of inline annotations in the signature. For example, numpy stores typing information in stub files (which is fully supported). Other projects like networkx rely on typeshed to host/maintain type annotations entirely independent of the project itself - so not only is the typing information not in the signature, it's not maintained with the source code! Again - this makes sense in the context of separating concerns: the type annotations are primarily for tooling, the docstrings are for humans. Having typing information in the docstring continues to be useful independent of type annotations.