earthaccess icon indicating copy to clipboard operation
earthaccess copied to clipboard

Incorrect types passed by end-users at runtime are not caught until our code tries to do incompatible things with those objects (e.g. `.upper()` on an `Auth` instance)

Open JessicaS11 opened this issue 1 year ago • 11 comments

In testing for implementing cloud read-in in icepyx using CryoCloud, @rwegener2 and I discovered something changed between 0.6.2 and 0.8.2 that causes a failure where code ran successfully previously. In particular, the line: s3 = earthaccess.get_s3fs_session(daac="NSIDC", provider=auth), where auth is an Auth instance, fails with an AttributeError when it tries to run the string operation upper() on the auth object (see full traceback here). I haven't dug into recent changes but wanted to report here in case it's already known/being resolved.

JessicaS11 avatar Dec 22 '23 22:12 JessicaS11

Thanks for the report! provider expects a string value, not an Auth instance. I'm not sure why that worked OK in earlier versions of earthaccess, but it probably shouldn't have! https://earthaccess.readthedocs.io/en/latest/user-reference/api/api/#earthaccess.api.get_s3fs_session

I think we need to think more about how this (and other) APIs in earthaccess could catch runtime type errors before they become confusing messages later. Failing early with "Unexpected type (Auth) received for provider. Expected str." would be a big improvement to user experience. Is there like a library we can use where we can throw a decorator on a type-annotated function to also get runtime checking? We do have the type annotations, so users can also use a typechecker to discover such bugs, which is better than nothing :tada: :)

For reference, this is the change that introduced the new call to .upper() under the valid (but in this specific case, incorrect) assumption that it was operating on a string.

mfisher87 avatar Dec 23 '23 00:12 mfisher87

Does this new description of the problem work for everyone?

mfisher87 avatar Dec 23 '23 00:12 mfisher87

provider expects a string value, not an Auth instance. I'm not sure why that worked OK in earlier versions of earthaccess, but it probably shouldn't have!

We were supplying a valid daac, so given the logic it never tried to use our invalid provider input (then lines 213+, now lines 251+, in store.py):

            elif daac is not None:
                s3_credentials = self.auth.get_s3_credentials(daac=daac)
            elif provider is not None:
                s3_credentials = self.auth.get_s3_credentials(provider=provider)

Thanks for linking to the docs page - since we're already providing a daac, we could just remove the provider.

That said, I'll keep my eye on this for catching runtime type errors. : )

JessicaS11 avatar Jan 02 '24 19:01 JessicaS11

Ahh, that makes sense. Thanks! :)

mfisher87 avatar Jan 02 '24 20:01 mfisher87

Thanks to you both for investigating this error!

This feels like an issue of Python as a language not enforcing types. I see earthaccess already provides type hints (🎉). One library I've used on past projects to enforce the hints is pydantic, which worked well and has some nice features like coercing types. A brief google search shows this isn't the only option (Ex. typeguard). These could be ways to automatically raise errors when arguments don't match the specified types.

rwegener2 avatar Jan 03 '24 02:01 rwegener2

I love Pydantic!

Haven't use typeguard before though. I love the simplicity of adding @typechecked to a function with annotations and having the runtime type checks added magically :man_cook: :pinched_fingers: I'm also reading about beartype whose main selling point seems to be speed. Both have had recent releases. beartype seems to default to checking everything, no decorator required. That's kind of nice.

mfisher87 avatar Jan 03 '24 16:01 mfisher87

Here's the pydantic feature I think we'd use: https://docs.pydantic.dev/latest/api/validate_call/

mfisher87 avatar Jan 08 '24 23:01 mfisher87

This validation could help with https://github.com/nsidc/earthaccess/issues/279 as well! Especially if our runtime validation tool can also coerce types.

mfisher87 avatar Feb 06 '24 19:02 mfisher87

Looks like Pydantic will do this. We may only want this at the API level, because I feel it could be confusing to debug the code with multiple levels of validation.

https://docs.pydantic.dev/latest/concepts/validation_decorator/#argument-types

As with the rest of Pydantic, types can be coerced by the decorator before they're passed to the actual function

mfisher87 avatar Feb 06 '24 21:02 mfisher87

Given pydantic's massive size, I'd caution against pulling it into the mix, especially if you use it only for the @validate_call decorator. If you simply want runtime type validation of function arguments, then I recommend either typeguard or beartype (as mentioned in previous comments), as these are much smaller libraries, since they have much narrower functionality than pydantic.

If you do want to leverage other capabilities of pydantic, then I'd suggest instead looking at msgspec, which is much more performant and much smaller than pydantic. It doesn't validate function arguments though, so you would still need to use typeguard or beartype for such validation.

chuckwondo avatar Feb 27 '24 00:02 chuckwondo

Given the major speed advantage of beartype over typeguard, beartype might be the better choice. It also appears to be the most actively developed and most PEP-compliant of other choices. See https://beartype.readthedocs.io/en/latest/moar/

chuckwondo avatar Feb 27 '24 00:02 chuckwondo