Type annotations
I rely pretty heavily on type annotations, and I envisage using ro-crate-py a lot in the near future. Would you be open to me submitting a PR that adds type annotations to the library?
If this is okay, could I bump the minimum Python version from 3.7 to 3.9? 3.7 has been EOL for a while, and 3.8 will be EOL just over a week: https://devguide.python.org/versions/. The reason this matters is that it unlocks better type syntax such as list[str].
Would you be open to me submitting a PR that adds type annotations to the library?
OK, thank you!
could I bump the minimum Python version from 3.7 to 3.9
Already done that in #200
@multimeric are you still/yet working on this?
If not: I put some effort into getting type annotations to work for ro-crate-py. I will probably keep working on it from time to time: https://github.com/dnlbauer/ro-crate-py/tree/type_annotations
No I'm not. I made a start which you can feel free to look at, use or copy here: https://github.com/ResearchObject/ro-crate-py/compare/master...WEHI-SODA-Hub:ro-crate-py:typing.
However I ended up developing alternative approaches to this problem, which is why I never finished this.
https://github.com/dnlbauer/ro-crate-py/tree/type_annotations has working type annotations (and even checks for them in CI) for the library.
However, there are several issues i ran into when trying to implement and to use this:
-
Inconsistent use of types in the library itself: The library itself is often very unclear how it uses the input of functions and what types it expects. For example,
metadata.py/find_root_entity_idhas an argumententities, which actually works on JSON-LD dictionaries. For other cases, a function implicitly accepts a myriad of different types (i.e. A path, string, file, or stream), but is not very transparent in how they are used or the information about the type is simply lost because everything is stored in dicts. -
A lot of functions are written in a way, that makes inferring the type impossible. For example, methods accessing attributes can return any mix of primitives, JSON-LD dicts, lists thereof, or Entity subclasses. Therefore, type annotations on these methods have to be so permissive, that they end up providing little real benefit. As a library user, you would have to cast the return values manually to what you expect, since the type checker will never be able to infer it.
Example:
# You might retrieve the author of a dataset like this:
rocrate["author"]
# from the underlying function signature, author could now be
# anything from a primitive, dict, list, Entity, ..
# marked as error, because you cannot index into a boolean value or a string like that
print(author["name"])
# you would need to cast it
print(cast(Entity, author)["name"])
Therefore, while the branch shows how type annotations could be implemented, I don't see see much practical benefit while exploring this idea. The mix of raw dicts encoding JSONLD and typed Entities comes with a lot of flexibility when using the Entity API, but it also means types bring little to no benefit for a user.