earthaccess
earthaccess copied to clipboard
Incorrect types passed by end-users at runtime are not caught until our code tries to do incompatible things with those objects (e.g. `.upper()` on an `Auth` instance)
In testing for implementing cloud read-in in icepyx using CryoCloud, @rwegener2 and I discovered something changed between 0.6.2 and 0.8.2 that causes a failure where code ran successfully previously. In particular, the line: s3 = earthaccess.get_s3fs_session(daac="NSIDC", provider=auth), where auth is an Auth instance, fails with an AttributeError when it tries to run the string operation upper() on the auth object (see full traceback here). I haven't dug into recent changes but wanted to report here in case it's already known/being resolved.
Thanks for the report! provider expects a string value, not an Auth instance. I'm not sure why that worked OK in earlier versions of earthaccess, but it probably shouldn't have! https://earthaccess.readthedocs.io/en/latest/user-reference/api/api/#earthaccess.api.get_s3fs_session
I think we need to think more about how this (and other) APIs in earthaccess could catch runtime type errors before they become confusing messages later. Failing early with "Unexpected type (Auth) received for provider. Expected str." would be a big improvement to user experience. Is there like a library we can use where we can throw a decorator on a type-annotated function to also get runtime checking? We do have the type annotations, so users can also use a typechecker to discover such bugs, which is better than nothing :tada: :)
For reference, this is the change that introduced the new call to .upper() under the valid (but in this specific case, incorrect) assumption that it was operating on a string.
Does this new description of the problem work for everyone?
provider expects a string value, not an Auth instance. I'm not sure why that worked OK in earlier versions of earthaccess, but it probably shouldn't have!
We were supplying a valid daac, so given the logic it never tried to use our invalid provider input (then lines 213+, now lines 251+, in store.py):
elif daac is not None:
s3_credentials = self.auth.get_s3_credentials(daac=daac)
elif provider is not None:
s3_credentials = self.auth.get_s3_credentials(provider=provider)
Thanks for linking to the docs page - since we're already providing a daac, we could just remove the provider.
That said, I'll keep my eye on this for catching runtime type errors. : )
Ahh, that makes sense. Thanks! :)
Thanks to you both for investigating this error!
This feels like an issue of Python as a language not enforcing types. I see earthaccess already provides type hints (🎉). One library I've used on past projects to enforce the hints is pydantic, which worked well and has some nice features like coercing types. A brief google search shows this isn't the only option (Ex. typeguard). These could be ways to automatically raise errors when arguments don't match the specified types.
I love Pydantic!
Haven't use typeguard before though. I love the simplicity of adding @typechecked to a function with annotations and having the runtime type checks added magically :man_cook: :pinched_fingers: I'm also reading about beartype whose main selling point seems to be speed. Both have had recent releases. beartype seems to default to checking everything, no decorator required. That's kind of nice.
Here's the pydantic feature I think we'd use: https://docs.pydantic.dev/latest/api/validate_call/
This validation could help with https://github.com/nsidc/earthaccess/issues/279 as well! Especially if our runtime validation tool can also coerce types.
Looks like Pydantic will do this. We may only want this at the API level, because I feel it could be confusing to debug the code with multiple levels of validation.
https://docs.pydantic.dev/latest/concepts/validation_decorator/#argument-types
As with the rest of Pydantic, types can be coerced by the decorator before they're passed to the actual function
Given pydantic's massive size, I'd caution against pulling it into the mix, especially if you use it only for the @validate_call decorator. If you simply want runtime type validation of function arguments, then I recommend either typeguard or beartype (as mentioned in previous comments), as these are much smaller libraries, since they have much narrower functionality than pydantic.
If you do want to leverage other capabilities of pydantic, then I'd suggest instead looking at msgspec, which is much more performant and much smaller than pydantic. It doesn't validate function arguments though, so you would still need to use typeguard or beartype for such validation.
Given the major speed advantage of beartype over typeguard, beartype might be the better choice. It also appears to be the most actively developed and most PEP-compliant of other choices. See https://beartype.readthedocs.io/en/latest/moar/