ro-crate-py icon indicating copy to clipboard operation
ro-crate-py copied to clipboard

Type annotations

Open multimeric opened this issue 1 year ago • 4 comments

I rely pretty heavily on type annotations, and I envisage using ro-crate-py a lot in the near future. Would you be open to me submitting a PR that adds type annotations to the library?

If this is okay, could I bump the minimum Python version from 3.7 to 3.9? 3.7 has been EOL for a while, and 3.8 will be EOL just over a week: https://devguide.python.org/versions/. The reason this matters is that it unlocks better type syntax such as list[str].

multimeric avatar Sep 23 '24 02:09 multimeric

Would you be open to me submitting a PR that adds type annotations to the library?

OK, thank you!

could I bump the minimum Python version from 3.7 to 3.9

Already done that in #200

simleo avatar Sep 24 '24 15:09 simleo

@multimeric are you still/yet working on this?

If not: I put some effort into getting type annotations to work for ro-crate-py. I will probably keep working on it from time to time: https://github.com/dnlbauer/ro-crate-py/tree/type_annotations

dnlbauer avatar Feb 03 '25 13:02 dnlbauer

No I'm not. I made a start which you can feel free to look at, use or copy here: https://github.com/ResearchObject/ro-crate-py/compare/master...WEHI-SODA-Hub:ro-crate-py:typing.

However I ended up developing alternative approaches to this problem, which is why I never finished this.

multimeric avatar Feb 04 '25 04:02 multimeric

https://github.com/dnlbauer/ro-crate-py/tree/type_annotations has working type annotations (and even checks for them in CI) for the library.

However, there are several issues i ran into when trying to implement and to use this:

  1. Inconsistent use of types in the library itself: The library itself is often very unclear how it uses the input of functions and what types it expects. For example, metadata.py/find_root_entity_id has an argument entities, which actually works on JSON-LD dictionaries. For other cases, a function implicitly accepts a myriad of different types (i.e. A path, string, file, or stream), but is not very transparent in how they are used or the information about the type is simply lost because everything is stored in dicts.

  2. A lot of functions are written in a way, that makes inferring the type impossible. For example, methods accessing attributes can return any mix of primitives, JSON-LD dicts, lists thereof, or Entity subclasses. Therefore, type annotations on these methods have to be so permissive, that they end up providing little real benefit. As a library user, you would have to cast the return values manually to what you expect, since the type checker will never be able to infer it.

Example:

# You might retrieve the author of a dataset like this:
rocrate["author"]
# from the underlying function signature, author could now be
# anything from a primitive, dict, list, Entity, ..

# marked as error, because you cannot index into a boolean value or a string like that
print(author["name"]) 

# you would need to cast it
print(cast(Entity, author)["name"])

Therefore, while the branch shows how type annotations could be implemented, I don't see see much practical benefit while exploring this idea. The mix of raw dicts encoding JSONLD and typed Entities comes with a lot of flexibility when using the Entity API, but it also means types bring little to no benefit for a user.

dnlbauer avatar May 22 '25 08:05 dnlbauer