alibi icon indicating copy to clipboard operation
alibi copied to clipboard

Runtime type checking with beartype

Open jklaise opened this issue 3 years ago • 27 comments

Partially addresses #488 by using beartype to do runtime type checking.

Currently annotated part of the public API (specifically, explainers in the alibi.explainers subpackage and methods __init__, fit, explain). The intention is to see if tests pass, but there are quite a few todos:

Essential:

  • [x] Provide simple runtime type-checking for the public interface of alibi.explainers. Note that beartype is statistical - it assumes containers are homogenous, thus this is not 100% correct or extensive type checking.
  • [ ] Relax types to allow configuration-driven initialization of explainers. Specific example - relax tuple types to Sequence[int] so we can support things like #493.
    • ~Caveat: Sequence type is generic only from Phython 3.9, so Sequence[int] fails for Python < 3.9. Unless we could use Sequence from typing_extensions (need to check this would work).~ I was wrong about this, the restriction is only on collections.abc.Sequence.
    • Relaxing the Python API may result in some (hopefully minor) user confusion as shape: Tuple[int] can be a lot more clear than shape: Sequence[int]. pydantic actually has the advantage here because using their decorator @validate_arguments we could keep shape: Tuple[int] but lists would be parsed to tuples automatically.

Nice to have:

  • [ ] Provide extensive/deep runtime type-checking for the public interface of alibi.explainers. beartype may gain this functionality in the future. Alternatively, we might want custom class methods doing this.
  • [ ] Narrow down types where relevant and use beartype validator functionality, e.g. define custom types like NDArray2D to signify only 2-d numpy arrays are acceptable for AnchorTabular

To think about and/or out of scope:

  • [ ] Validation beyond types - of course we also need custom logic to decide on invalid state on combinations of user arguments which cannot be achieved by simple runtime type-checking even with validation extensions. We should standardize how this is done and what kinds of exceptions are raised (custom exceptions hierarchy?).
  • [ ] Return types are also checked/validated by beartype. We could narrow down these types to have better control of the return API as well as catch impementation bugs. However, it's potentially not as useful as using pydantic for return types because there is another use case for Explanation objects - namely, persisted Explanation loaded back into python for further inspection, this would need a "model" of the Explanation and some parsing logic (e.g. lists of lists get converted back into numpy). pydantic is the natural choice here (see earlier attempt https://github.com/SeldonIO/alibi/pull/342).
  • [ ] It's possible pydantic is a decent choice over beartype anyway especially if we end up wanting it for return type schemas. Technically, @validate_arguments can do runtime checks and attempt parsing arguments to their expected types, but it wouldn't extend to more advanced validation like custom types with custom validators. That being said, we could always delegate custom validation to the internals of the validation rather than rely on specific custom types like NDArray2D, though specific types (as long it's clear what they are) are nice to document the expectations of the public API.

jklaise avatar Oct 22 '21 16:10 jklaise

There appears to be some undesirable interaction with sphinx autodoc and beartype raising an error: beartype.roar.BeartypeDecorHintNonpepException: @beartyped CEM.__init__() parameter "predict" wrapper parameter tensorflow.compat.v1.keras.Model not class.

I think the culprit here is mocking modules such as tensorflow as autodoc is run which then results in these runtime exceptions by beartype as the objects have been intercepted and the types no longer correspond to the expected ones. This is annoying to say the least. A potential solution to this may be to switch fully to sphinx-apidoc (see #471) which is a static doc builder that doesn't rely on importing modules to be documented, although it would be a reasonably large switch and it remains to be tested whether it is mature enough.

jklaise avatar Oct 25 '21 12:10 jklaise

Not related to beartype, but on the subject of autodoc, mocked modules, decorators etc I also had trouble with documenting torch.jit.script decorated functions in alibi-detect. Lines 319-332 in conf.py were required to fix. I can't see this helping with the above issue, but this is more of a comment to say that I wonder if sphinx-apidoc would help with issues like this one too...

ascillitoe avatar Oct 25 '21 12:10 ascillitoe

Thanks so much for investigating @beartype as your potential runtime type-checker. Supporting alibi would be quite exciting for us – so I'll do everything within my sad grab-bag of skills to satisfy your use case.

Provide extensive/deep runtime type-checking for the public interface of alibi.explainers. beartype may gain this functionality in the future. Alternatively, we might want custom class methods doing this.

Yup. beartype currently:

  • Shallowly type-checks almost everything that's PEP-compliant.
  • Deeply type-checks many things that are PEP-compliant – but significant work remains to be done. We're tentatively targetting beartype 1.0.0 (to be presumably released late next year) as the first version to deeply type-check all PEP 484- and 585-compliant type hints (e.g., typing.Dict[str, int], dict[str, int]).
  • Provides the beartype validator API, enabling you to preempt my slothful dev habits by implementing your own hand-rolled O(1) or O(n) deep type-checking type hints first. Admittedly, you shouldn't have to do that; that's the bare minimum that beartype should implicitly do for you.

We all pay the price for my negligence.

I think the culprit here is mocking modules such as tensorflow as autodoc is run which then results in these runtime exceptions by beartype as the objects have been intercepted and the types no longer correspond to the expected ones.

Yikes. That's some fearsomely subtle interaction right there.

As suggested over here, one (possibly) sane solution might be for beartype to conditionally disable itself when autodoc is running. After all, is there a valid use case for runtime type-checking to even be performed during documentation building? According to this StackOverflow post, this should be trivial via this two-line snippet:

# If "sphinx-build" is currently running, Sphinx's "autodoc" extension
# is almost certainly importing "beartype" as an unintended side effect
# of documenting a downstream codebase via runtime introspection.
# Since this typically results in disaster, reduce @beartype to a noop.
if Path(argv[0]).name == "sphinx-build" or Path(argv[0]).name == "build.py":
    beartype = lambda func: func

It would be nice for us to unit test that. Let's reflect on how that might happen.

Additionally, it should be feasible for alibi to manually kludge this on your end by monkey-patching beartype to the identity decorator from within your top-level conf.py Sphinx configuration: e.g.,

# *VERY* early in your "conf.py" before you import from "alibi":
import beartype
beartype.beartype = lambda func: func  # <-- make it so, bro

leycec avatar Oct 25 '21 23:10 leycec

@jklaise: You probably already know this, but all of the beartype.roar.BeartypeCallHintPepParamException: @beartyped CounterfactualProto.__init__() parameter gamma="100" violates type hint <class 'float'>, as "100" not instance of float. exceptions should be trivially resolvable by just replacing float with numbers.Real from Python's numeric tower.

~~Relatedly, would you happen to have a handy traceback for that unreadable beartype.roar.BeartypeDecorHintNonpepException: @beartyped CEM.__init__() parameter "predict" wrapper parameter tensorflow.compat.v1.keras.Model not class. exception? The most recent pytest log output doesn't yield that. Sadness ensues.~~ :crying_cat_face:

~~Commit 5e23081e on our end resolves the erroneous "wrapper parameter" substring embedded in that exception, but the resulting exception is still unreadable.~~ I think my eye's gonna start twitching again. EDIT: Pretty sure I got it! That last sentence still applies, though.

leycec avatar Oct 26 '21 08:10 leycec

@jklaise: You probably already know this, but all of the beartype.roar.BeartypeCallHintPepParamException: @beartyped CounterfactualProto.__init__() parameter gamma="100" violates type hint <class 'float'>, as "100" not instance of float. exceptions should be trivially resolvable by just replacing float with numbers.Real from Python's numeric tower.

Nice catch! I think this is an example where the parameter really should've been a float, beartype paying dividends already.

jklaise avatar Oct 26 '21 08:10 jklaise

@leycec here's a stack trace from a local docs build (I realize it's not on the CI job as beartype is not a dependency of the job):

WARNING: autodoc: failed to import module 'alibi'; the following exception was raised:                                                                                                                               
Traceback (most recent call last):                                                                                                                                                                                   
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/sphinx/ext/autodoc/importer.py", line 70, in import_module                                                                                          
    return importlib.import_module(modname)                                                                                                                                                                          
  File "/home/janis/.conda/envs/py38/lib/python3.8/importlib/__init__.py", line 127, in import_module                                                                                                                
    return _bootstrap._gcd_import(name[level:], package, level)                                                                                                                                                      
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import                                                                                                                                                    
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load                                                                                                                                                  
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked                                                                                                                                         
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked                                       
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module                                 
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed                                                                                                                                       
  File "/home/janis/PycharmProjects/public-fork-alibi/alibi/__init__.py", line 1, in <module>                                                                                                                        
    from . import confidence, datasets, explainers, utils                                                 
  File "/home/janis/PycharmProjects/public-fork-alibi/alibi/explainers/__init__.py", line 9, in <module>                                                                                                             
    from .cem import CEM                             
  File "/home/janis/PycharmProjects/public-fork-alibi/alibi/explainers/cem.py", line 14, in <module>                                                                                                                 
    class CEM(Explainer, FitMixin):                  
  File "/home/janis/PycharmProjects/public-fork-alibi/alibi/explainers/cem.py", line 17, in CEM                                                                                                                      
    def __init__(self,                               
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_decor/main.py", line 159, in beartype                                                                                                     
    func_wrapper_code = generate_code(func_data)                                                          
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_decor/_code/codemain.py", line 166, in generate_code                                                                                      
    code_check_params = _code_check_params(data)                                                          
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_decor/_code/codemain.py", line 373, in _code_check_params                                                                                 
    func_wrapper_code_param = pep_code_check_param(                                                       
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_decor/_code/_pep/pepcode.py", line 149, in pep_code_check_param                                                                           
    reraise_exception_cached(                        
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_util/cache/utilcacheerror.py", line 200, in reraise_exception_cached                                                                      
    raise exception.with_traceback(exception.__traceback__)                                               
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_decor/_code/_pep/pepcode.py", line 132, in pep_code_check_param                                                                           
    ) = pep_code_check_hint(hint)                    
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_util/cache/utilcachecall.py", line 351, in _callable_cached                                                                               
    raise exception                                  
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_util/cache/utilcachecall.py", line 343, in _callable_cached                                                                               
    return_value = params_flat_to_return_value[params_flat] = func(                                       
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_decor/_code/_pep/_pephint.py", line 1300, in pep_code_check_hint                                                                          
    hint_curr_expr=add_func_scope_types(                                                                  
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_decor/_code/_pep/_pepscope.py", line 343, in add_func_scope_types                                                                         
    die_unless_hint_nonpep_type(hint=cls, exception_prefix=exception_prefix)                                                                                                                                         
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_util/hint/nonpep/utilnonpeptest.py", line 276, in die_unless_hint_nonpep_type
    die_unless_type_isinstanceable(                  
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_util/cls/pep/utilpep3119.py", line 112, in die_unless_type_isinstanceable                                                                 
    die_unless_type(                                 
  File "/home/janis/.conda/envs/py38/lib/python3.8/site-packages/beartype/_util/cls/utilclstest.py", line 59, in die_unless_type                                                                                     
    raise exception_cls(f'{exception_prefix}{repr(cls)} not class.')                                      
beartype.roar.BeartypeDecorHintNonpepException: @beartyped CEM.__init__() parameter "predict"  wrapper parameter tensorflow.compat.v1.keras.Model not class.

jklaise avatar Oct 26 '21 08:10 jklaise

@leycec I can't really think of a use case for running beartype whilst building docs either and even if we wanted to take advantage of the importing side-effect that autodoc does, we would have to deal with these arbitrary mocks which makes it seem even more pointless (I could be wrong though).

The conditional disabling of beartype during sphinx run sounds like a good solution to me provided we're sure that we never want to run it during a sphinx run. I'm not sure I can offer any advice on unit-testing it on beartype side though.

jklaise avatar Oct 26 '21 08:10 jklaise

beartype paying dividends already.

You are a gentleman and a scholar and I wish you all the academic citations.

here's a stack trace from a local docs build

zomg

Well, that escalated quickly. @beartype once generated simple tracebacks. Now even I cannot say where this codebase will go next.

how did it come to this

The conditional disabling of beartype during sphinx run sounds like a good solution to me provided we're sure that we never want to run it during a sphinx run.

Yes. That is the proviso, isn't it? In theory, beartype could additionally attempt to detect whether autodoc is currently running and only reduce to a noop in that specific case. In practice, that sounds scary and fragile. I briefly glimpsed into the special hell that is the Sphinx Events API at sphinx.events. Five minutes later, I fled with the tattered remnants of my sanity and promised myself never to do that again.

Perhaps we'll just brute-force it for now with the crude two-liner shipped above. If and when alibi (or someone else) justifiably complains, we'll then rethink our stance on dangerous monkey-patching and invent something more robust and less breaky.

Would that satisfy short-term integration concerns – or should I think much harder about this than my sleep deprivation currently warrants? I think when I must, Janis. But I prefer not to. :wink:

leycec avatar Oct 27 '21 08:10 leycec

@leycec That sounds like a plan! I'm not averse to monkey-patching in client code, but handling this upstream seems like the best solution as I imagine the overlap between projects wanting to use beartype and using autodoc is significant.

jklaise avatar Oct 27 '21 08:10 jklaise

Super-duper! We'll force this face-melting workaround into our upcoming beartype 0.9.1 point release to be released in just a week or two.

I'll also submit a descriptive StackOverflow question querying the community for a better, stabler, stronger long-term solution that selectively targets only autodoc rather than the entire Sphinx stack. Until then, thanks again for all the phenomenal enthusiasm for well-typed Python – and don't hesitate to bang on our issue tracker if @beartype ever starts holding alibi back from achieving its projected world ML domination.

leycec avatar Oct 27 '21 22:10 leycec

Sunny Sunday update: in reflection, none of the available approaches for detecting whether Sphinx and autodoc are currently running are satisfactory. They all invite non-portable, error-prone, and fragile edge cases that will blow up the @beartype codebase. We already have enough people shaking their clenched fists at us. Right? You know we do.

I've opened a Sphinx feature request kindly requesting a new utility function trivializing this work for us and everyone else. Come leap onto the friendly dogpile! :dog2:

leycec avatar Nov 01 '21 02:11 leycec

We've new resolved beartype's prior frustration with Sphinx's autodoc extension. Oh, yes; it's happening. @beartype now conditionally detects when it thinks autodoc is running and silently reduces itself to a noop. This has been extensively tested on our end with a surprisingly robust functional test that passes everywhere except Python ≥ 3.10, where Sphinx's continued ill-advised usage of distutils raises deprecation warnings. ...so not our problem

I'll be officially releasing this tomorrow with beartype 0.9.1. This is critical stuff. It is time.

Thanks yet again for your generous consideration, @jklaise. Whether alibi elects to continue hardening itself with beartype or opt instead for its crass older brother pydantic, you've helped us patch up a critical chink in our breastplate. I wish everyone all the best with @SeldonIO... and beyond!

leycec avatar Nov 05 '21 08:11 leycec

beartype 0.9.1 has landed and with it autodoc integration. Praise be to alibi and @jklaise.

leycec avatar Nov 06 '21 06:11 leycec

@ascillitoe @arnaudvl the discussion regarding float support needs some more careful consideration.

Currently, python treats int as an acceptable substitue for float, hence tools like mypy don't complain. This is not the case for e.g. beartype which is strict, for good reason:https://github.com/beartype/beartype/issues/66 (WARNING: rabbit-hole). See also test-case failures on this PR where we pass int to float-hinted functions en masse.

So we're faced with a choice:

  1. Keep implicitly supporting int arguments to float-hinted functions/methods
  2. Be strict and only support float arguments to float-hinted functions/methods

Some discussion on pros/cons:

  1. Keep implicitly supporting int arguments to float-hinted functions/methods
  • If going with a strict checker like beartype would need to update type hints to something like Union[int, float] or numbers.Real (or, if going down the above-mentioned rabbit-hole, some other type monstrosity)
  • There are some considerations when attempting to mingle ints with tensorflow/pytorch computation graphs that expect float (in fact, they would expect float32). I believe at this point in time most such operations implicitly apply casting to the user input (e.g. int->float32 or even float->float32) such that there are no errors, but there may be edge cases?
  • If the above point is not an issue anymore and similar libraries implicitly accept ints for floats, then this approach seems most sensible
  1. Be strict and only support float arguments to float-hinted functions/methods
  • This has less potential for ambiguity and maybe less concern to us for dealing with implicit/explicit casting of int to float
  • This is a breaking change in that every script/config file happily using int where float is expected would break (see also test cases failing on this PR).

Personally I'm torn between the two approaches, although option 1. has a slight advantage, primarily because we wouldn't be breaking any existing code and it also seems that accepting int for float is fairly standard in similar libraries.

jklaise avatar Nov 08 '21 09:11 jklaise

My Fortran upbringing pushes me towards option 2, I still can't help writing a=42. etc instead of a=42! I take your point though that breaking changes aren't ideal...

ascillitoe avatar Nov 08 '21 10:11 ascillitoe

@ascillitoe I'm with you on that! However, I think I'm leaning towards Option 1 as it's more standard within Python (including scientific libraries!) and would also not break existing things or frustrate new users.

I had an idea which turns out is actually very bad (so let's not do it), but for posterity—we could conditionally use beartype if it exists in the environment and even turn it on/off via an environment variable (https://github.com/posita/numerary does both things). That way alibi users could turn it on and off (off by default) whereas mlserver would turn it on by default. Now, sounds good initially, but it's actually bad because someone developing using alibi won't be able to deploy things using mlserver without changing their code/config if they've used the more lax types.

jklaise avatar Nov 10 '21 11:11 jklaise

@jklaise agreed on all points above. As much as it would be nice to lead the way on being more strict on float vs int here, the "un-pythonic" nature of this would probably be overly frustrating for users. Perhaps going with option 1 and slightly more convoluted type hints isn't so bad... We already have complexity in type hints for other things anyway, for example Union[str, os.PathLike] for filepaths, and tensorflow vs pytorch model stuff etc.

ascillitoe avatar Nov 10 '21 11:11 ascillitoe

There's a slight issue with ForwardRefs here. I had to adjust type hints like encoder: "Union[tensorflow.keras.Model, torch.nn.Module]" to encoder: Union["tensorflow.keras.Model", "torch.nn.Module"], to make them beartype-friendly, however this messes up the API docs and these hints only show up as ForwardRef. This may be another reason for looking into #471.

jklaise avatar Nov 18 '21 14:11 jklaise

It appears that beartype also doesn't like non-strict optional handling, i.e. k: int = None. I guess it doesn't hurt to be explicit everywhere, i.e. k: Optional[int] = None, though it does add a fair bit of clutter.

jklaise avatar Nov 18 '21 14:11 jklaise

There are more serious issues with handling test cases where we mock certain objects, e.g. see this: https://github.com/SeldonIO/alibi/blob/41d348e3b694fc29f72369f3bc8d91c035ffa3d4/alibi/explainers/tests/test_shap_wrappers.py#L1306-L1308 Currently I don't see how we can retain mocks in test code whilst making sure beartype runs too. There may be a way to do mocking in a way that doesn't violate return types that I'm not aware of.

jklaise avatar Nov 18 '21 16:11 jklaise

Thought the fix might be as simple as passing return_type=Explanation to patch.object but it is not that simple as it turns out.

jklaise avatar Nov 18 '21 17:11 jklaise

Butterfingers, the correct kwarg is return_value not return_type (which unhelpfully gets swallowed). Now fixed via passing a dummy Explanation(dict(), dict()) object as a return value to the patched method.

jklaise avatar Nov 18 '21 17:11 jklaise

Codecov Report

Merging #511 (3ac9c81) into master (649e211) will increase coverage by 0.07%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #511      +/-   ##
==========================================
+ Coverage   82.60%   82.67%   +0.07%     
==========================================
  Files          77       77              
  Lines       10507    10554      +47     
==========================================
+ Hits         8679     8726      +47     
  Misses       1828     1828              
Impacted Files Coverage Δ
alibi/explainers/ale.py 60.89% <100.00%> (+0.58%) :arrow_up:
alibi/explainers/anchor_image.py 93.29% <100.00%> (+0.12%) :arrow_up:
alibi/explainers/anchor_tabular.py 78.92% <100.00%> (+0.19%) :arrow_up:
alibi/explainers/anchor_text.py 87.11% <100.00%> (+0.09%) :arrow_up:
alibi/explainers/cem.py 79.48% <100.00%> (+0.23%) :arrow_up:
alibi/explainers/cfproto.py 75.68% <100.00%> (+0.14%) :arrow_up:
alibi/explainers/cfrl_base.py 88.21% <100.00%> (+0.12%) :arrow_up:
alibi/explainers/cfrl_tabular.py 60.60% <100.00%> (+1.23%) :arrow_up:
alibi/explainers/counterfactual.py 80.20% <100.00%> (+0.26%) :arrow_up:
alibi/explainers/integrated_gradients.py 89.13% <100.00%> (+0.08%) :arrow_up:
... and 3 more

codecov[bot] avatar Nov 19 '21 00:11 codecov[bot]

</phew>

Justifiably feels like I'm personally to blame for the migraine now debilitating Janis as we speak. I didn't actually realize that mypy permissively accepts:

  • PEP 563-style deferred annotations like "Union[tensorflow.keras.Model, torch.nn.Module]" in the absence of PEP 563 (e.g., from __future__ import annotations). That's... fascinating. And horrifying! But we can probably do that too, because we already have a perfect PEP 563 resolver. Where mypy goes, beartype reluctantly follows. That said, does mypy actually support that? I couldn't find a discussion of that behaviour in PEP 484 – but that doesn't necessarily mean anything. Mypy does many non-standard things. I've learned to accept that.
  • Implicitly optional hints (e.g., k: int = None rather than k: Optional[int] = None). We can definitely do that, too. Trivial, right? Sadly, Guido doesn't want us to:

A past version of this PEP allowed type checkers to assume an optional type when the default value is None, as in this code:

def handle_employee(e: Employee = None): ...

This would have been treated as equivalent to:

def handle_employee(e: Optional[Employee] = None) -> None: ...

This is no longer the recommended behavior. Type checkers should move towards requiring the optional type to be made explicit.

We're shameless Guido sycophants over at @beartype. So, we'd better do what sempai suggests.

Thanks so much for sticking with this gruelling transition, Janis. If we can do anything to ease your considerable suffering here, please pound on our issue tracker and it shall be done.

leycec avatar Nov 19 '21 07:11 leycec

The move to strict Optional checking as suggested by ex-BDFL himself seems sensible so I will prepare a commit (to be cherry-picked into master) that excises the strict_optional = False from our codebase.

jklaise avatar Nov 23 '21 11:11 jklaise

Now that the tests are all passing it's time to think about next steps. The following should be addresses still:

  • [ ] Relax Tuple types to Sequence - this is essential to support specifying parameters from a config
  • [ ] Decide if we want the type checking to be optional - I think we probably don't want this as it could lead to more headaches than necessary. The only reason we may consider it is to avoid breaking some existing application code, see next point
  • [ ] Type strictness (somewhat related to points 1. and 2.), especially to do with floats. Previously I said that moving to strict float is in contrast with general Python practices and has some potential to break existing application code. Since then my opinion has changed somewhat and now I think we should go ahead with strict float requirement for the main reason that it frees us from doing even more type casting (especially with tensorflow and pytorch sometimes being picky about int vs float. This would basically mean the following:
    • Any interaction with alibi public API would require strict float (either coming from Python or read from a config file)
    • Any other code using alibi would need to ensure the type conversion is done on their side (e.g. imagine some 3rd party processing function that returns integers which need to be interpreted as floats by alibi)

Keen to get your thoughts @ascillitoe @sakoush.

jklaise avatar Dec 06 '21 14:12 jklaise

To put a long story even longer, here's a summary of some internal discussions as we wrestle with what we want to actually achieve here.

Lay of the land

Fundamentally, the usage of a runtime type-checker boils down to the following question: Do we want alibi to a. Define strict type requirements for its public interface and only accept those types? b. Define type requirements, but cast provided arguments to the required types when possible?

As no surprise to anyone, beartype is perfect for 1. and pydantic (mostly) solves 2.

Now onto the considerations, and a toy example where we want num: float as an argument to the public interface.

Example 1 Using beartype and num: float typing. This is strict, non-permissive, if num is an int then that is not acceptable. The user/application developer is responsible for type correctness and parsing to the type expected by alibi. This is an example of (a) above.

Example 2 Using pydantic and num: float typing. This is non-strict, permissive, it is encouraging float but it will attempt to cast non-floats to the expected type and succeed in many cases, e.g. if int is passed. The user/application developer can be somewhat more lax about ensuring type correctness here (not desirable behaviour to encourage, but it may be necessary in some scenarios, e.g. shape: Tuple is not possible to define from within json as json has no concepts of tuples, only lists, and to avoid an intermediate parsing layer in user space we would need to extend the alibi type to shape: Sequence and parse intenally via shape=tuple(shape)). This is an example of (b) above.

Example 3 This one is interesting. Here we want to use beartype but accept broader types to get something similar to Example 2. So we say num: numbers.Real. This is still strict and non-permissive, but the type is broader, accepting both int and float. The important part: if the private API actully, truly requires a float, then alibi needs to cast it manually if we allow broader types in the public API. beartype is not a parsing library unlike pydantic so the onus is on us to cast. (This suggests that there are two categories of "broadened" type signatures - ones that truly can be either type internally and ones that are only broadened for convenience on the user/application developer part, but we still need to manually cast inside).

Breaking existing user code now

It was already mentioned above, but just to reiterate. If we go with something like Example 1, this has some potential for breaking existing user flows, e.g. someone relying on passing ints when floats are expected due to the permissiveness of Python itself as well as our non-existent runtime type-checking. I've come around to think that this is not such a big deal™ as users should be using the types that are specified in the public API rather than rely on implicit type casting, even if it's Python sponsored (note that this last sentence basically also dismisses solution (b) and pydantic approach to implicit casting...).

On changing types

It was raised that being overly strict with types might lead to breaking changes if we want to change types in the future. This is interesting because there are multiple ways one could change types:

  1. Broaden an existing type (e.g. num: float to num: numbers.Real). This is safe from breaking user code because we are just broadening a type. This also encourages a pattern of "start strict, broaden later" as it's much easier to broaden types than to narrow them.
  2. Change an existing type (e.g. num: float to num: int). Now this is no longer safe from breaking user code, although in effect we've changed the parameter's meaning without changing it's name which is probably bad form. In any case, changing parameter names can be nicely deprecated without breakage by using some decorator magic, it remains to be seen if something similar can be achieved with changing types by some clever use of introspection. TODO

Other considerations

Other pydantic use cases

As already mentioned previously, pydantic is a nice tool for writing data models and parsing data that are guaranteed to satisfy them. We are very likely to use pydantic to formalize the return API of our explain methods in the future. Parsing here is especially useful because we might imagine a scenario where we read a serialized explanation json into it's python form which would require, for example, parsing nested lists into numpy arrays which pydantic can do.

This is not to say that we should stick to a principle of not introducing too many dependencies as I don't think it's necessarily a good argument here, there's no particular reason why we couldn't use both beartype and pydantic but for different reasons.

Performance and statistical type checking

Some questions were asked about beartype's approach to statistical (read: non-deterministic) type checking for nested containers. As explained in their magnificent readme this is due to natural trade-offs between deep run-time type-checking in a dynamic language like Python on potentially the same object over and over again. That being said, maybe our arguments are never so complex and it might be more important to validate these fully (This can also be possible with beartype, albeit at the cost of switching to validators which is not as nice and standard as relying on PEP 484 type hints. Native support may come in the future). As a result beartype prides itself on O(1) runtime typechecking.

On the other hand, pydantic would do (afaik) deep type-checking. Again, this may not actually be an issue for typical argument types we may have.

Handling forward-refs and validators

As mentioned previously, pydantic validate_arguments decorator currently supports neither.

More realistic types

The num: int example is good to illustrate the fundamental questions, but perhaps using more complex types from the public alibi API would reveal more. TODO

jklaise avatar Dec 07 '21 19:12 jklaise

@jklaise: Wow! My profuse apologies for missing the entirety of your vigorous debate. I blame only my own laziness. Video games are the death of productivity yet I continue to play them. This must be what a vicious circle looks like.

So... I have possibly momentous things to say. Since this issue has been justifiably closed, I say these things into the illimitable void of GitHub with no expectation of any coherent reply by anyone – just as it should be:

# Import the requisite machinery.
from beartype import beartype, BeartypeConf

# Dynamically create a new @beartowertype decorator enabling the implicit numeric tower.
beartowertype = beartype(conf=BeartypeConf(is_pep484_tower=True))

# Decorate with this decorator rather than @beartype everywhere.
@beartowertype
def crunch_numbers(numbers: list[float]) -> float:
    return sum(numbers)

# This is now fine.
crunch_numbers([3, 1, 4, 1, 5, 9])

# This is still fine, too.
crunch_numbers([3.1, 4.1, 5.9])
  • @beartype and Pydantic are BFFLs (Best Friends for Life)! Okay. We're not. We probably despise each other and fervently hope for an ignominious and flaming end to the other guy. Honestly, I'm more mystified by Pydantic than anything. I really liked where they were going, but the shift to Rust is an itchy head-scratcher. Nonetheless, @beartype and Pydantic are both PEP-compliant type checkers and thus mutually compatible. We happily coexist. We're so different from one another that we don't even have much to say to one another. In fact, this leads us directly to...
  • beartype.claw. In our upcoming @beartype 0.15.0 release, ...to be released shortly for certain definitions of "shortly", @beartype will become the first usable hybrid runtime-static type-checker (i.e., something altogether more powerful than traditional runtime type-checkers like Pydantic or traditional static type-checkers like mypy and pyright). The hype train that will surely disappoint everyone is boarding.

In short, our new beartype.claw API will enable everyone to subject one or more arbitrary packages (including literally all packages) to both runtime and static type-checking with a single line of code at runtime. If none of that makes sense, it shouldn't, because none of that meaningfully existed before @beartype 0.15.0. Consider this alibi-friendly example:

# In your "alibi.__init__" submodule:
from beartype import BeartypeConf
from beartype.claw import beartype_this_package

# This single line of code installs an import hook subjecting the entirety of alibi to
# both runtime and static type-checking at runtime. Like standard static type-checkers,
# this type-checking has no associated runtime costs. Unlike standard static type-checkers,
# this type-checking is enforced at runtime. Mypy and pyright are ignorable. This isn't.
beartype_this_package(conf=BeartypeConf(is_pep484_tower=True))
# In another arbitrary "alibi.some_module" submodule:
good_int: int = 0.30  # <-- this is fine, because you've enabled the implicit numeric tower
bad_int: int = "Pretty sure this isn't an integer."  # <-- this is definitely not fine

What will happen when the user imports the fake alibi.some_module submodule I just awkwardly made up? A runtime type-checking violation, because @beartype statically analyzed the value assigned to the bad_int global as violating its type hint int. @beartype does this at import time via the aforementioned import hook installed by the initial call to the beartype_this_package() function. Note also the distinct lack of any explicit calls to anything from @beartype other than the initial call to the beartype_this_package() function.

That's it. A single line of code just type-checked the entirety of alibi at runtime, which nobody can ignore.

All of everything above currently works as advertised. Admittedly, I should just release all that already with @beartype 0.15.0. The perfect became the enemy of the good and I can't stop hunting obscure bugs submitted by my favourite users. This must be what it feels like when bear-cats cry. :crying_cat_face:

leycec avatar Jul 19 '23 04:07 leycec

@leycec great to hear from you! I've been following @beartype developments pretty closely and am becoming ever more impressed by what you've accomplished. I was aware of the support of the PEP484 implicit numerical tower (much appreciated as the word of Guido and thus user expectations appear to be law). I'm also very excited about the upcoming import hook and have been following some of the implementation discussion, here's hoping 0.15 lands soon™.

Whilst I have your attention, something I've been meaning to ask on @beartype, do you have some thoughts on the philosophy of the application of @beartype in applications vs libraries and in public APIs vs private ones? I've been thinking something along these lines:

  • as an application developer things are quite straightforward, I can choose to implement @beartype gradually or all at once in my application to act as a sort of user input validation layer for public APIs and a sort of internal consistency layer for private APIs (if I'm feeling skittish, I might only implement @beartype in my test suite)
  • as a library developer things are a bit less clear to me:
    • once beartype_this_package comes out, it seems a no-brainer to run it in a test suite to catch both incorrect tests and reveal some deficiencies in the implementation (as I had discovered further up this thread already)
    • but shipping @beartype to users is a big step up - using it on public APIs would make it a de-facto validation layer for user input which is probably good in the wild west of runtime Python and means we don't have to maintain heaps of type validation code. But shipping it for private APIs too is less clear - if an error does occur, it would point to an implementation error that we missed in tests and for the user the error might not be clear enough coming from library internals rather than the public API
  • Suppose we ship @beartype to users as our de-facto validation layer for user inputs. Now although I have nothing against documenting that our public API raises BeartypeException for misformatted input, might it be possible to wrap these errors in library-specific exceptions, i.e. we might want to raise something like AlibiTypeError from our public API whilst the originator BeartypeException is lurking beyond the surface as an implementation detail.

jklaise avatar Jul 19 '23 10:07 jklaise

Aww! Thanks for such kind GitHub words and the good summertime vibes, @jklaise. You are following us closer than even I suspected. Unsurprisingly, you know everything already and are at least five steps ahead.

Let's see if I can answer anything to anyone's satisfaction:

...do you have some thoughts on the philosophy of the application of @beartype in applications vs libraries and in public APIs vs private ones?

Yes. Use @beartype everywhere. That is the philosophy. :smile:

Admittedly, that was a trash answer. Let's see if I clarify what @leycec would do in these diverse use cases that demand that something useful be done:

...as an application developer things are quite straightforward, I can choose to implement @beartype gradually...

</nods>

Absolutely. The gradual way would probably be to incrementally, laboriously, and manually continue decorating all callables and classes of interest by the @beartype decorator. I myself no longer have time, money, or patience for the gradual way, because "gradual" really does mean graaaaaaaaadual. Manual decoration across million-line codebases is truly a Sisyphean chore that cripples with my shoulder and wrists with arthritic RSI. beartype.claw: save my body now.

Admittedly, I don't have money either way. But at least I'm more likely to have money if I devote scarce resources to tasks other than @beartype boilerplate. Maybe. Hopefully. That's the idea, anyway. Don't fail me now, capitalism!

...or all at once in my application.

</nods_nods>

Absolutely. This is the beartype.claw.beartype_all() way – an import hook that subjects an entire app stack (i.e., all packages imported under the active Python interpreter, including both standard modules and packages in Python's standard library and third-party modules and packages) to hybrid runtime-static type-checking by @beartype. Let's pretend that works as advertised.

once beartype_this_package comes out, it seems a no-brainer to run it in a test suite to catch both incorrect tests and reveal some deficiencies in the implementation

</nods_nods>

This is the way. When you're five steps ahead, you're five steps ahead.

but shipping @beartype to users is a big step up - using it on public APIs would make it a de-facto validation layer for user input which is probably good in the wild west of runtime Python and means we don't have to maintain heaps of type validation code. But shipping it for private APIs too is less clear - if an error does occur, it would point to an implementation error that we missed in tests and for the user the error might not be clear enough coming from library internals rather than the public API

...heh. You are not wrong, of course. I'm pretty sure you rarely are.

Thankfully, since @beartype acts entirely at runtime, you have the full array of Python's might at your disposal. One sly approach to dealing with type validation woes (both expected and unexpected) might be to expose a simple public configuration API at the alibi package level, enabling users to conditionally enable or disable runtime type-checking across your entire package as they see fit. If you're so inclined, a package-specific environment variable would do the trick:

# In your "alibi.__init__" submodule:
from os import environ as _environ

# If the ${ALIBI_IS_BEARTYPED} shell environment variable is truthy...
if _environ.get('ALIBI_IS_BEARTYPED', True):  # <-- True by default for maximal truth
    # Try to @beartype yourself. Since @beartype would then probably be just an
    # optional dependency of alibi, you'd rather silently ignore the lack of
    # @beartype here (than raise a fatal exception that everyone would hate).
    try:
        from beartype import BeartypeConf
        from beartype.claw import beartype_this_package
        beartype_this_package(conf=BeartypeConf(is_pep484_tower=True))
    except:
        pass

Ever more elaborate schemes are also at your beck and call – from alibi-specific configuration file settings in pyproject.toml to a full-blown alibi plugin API implemented with a confusing and unmaintainable object-oriented hierarchy of mixins, metaclasses, and dangerously dynamic behaviour. Personally, I'm a fan.

Ultimately, the ultimate power is in your hands. Praise be to alibi.

...we might want to raise something like AlibiTypeError from our public API whilst the originator BeartypeException is lurking beyond the surface as an implementation detail.

Yes! So much yes! This is an excellent feature request. I've been intending to do that for a few years. Then someone submits a competing feature or stupefying issue and I'm suddenly off at the races and failed to do anything about this.

Ideally, @beartype should expose a new BeartypeConf.violation_cls setting enabling wonderful downstream consumers such as yourself to substitute your own package-specific exception class in lieu of our default beartype.roar.BeartypeCallHintViolation class: e.g.,

beartype_this_package(conf=BeartypeConf(
    is_pep484_tower=True,
    violation_cls=AlibiTypeError,
))

Gently prod me with a grubby stick if I fail to do something about this, which I will. I'm just that kinda GitHubber. :face_exhaling:

leycec avatar Jul 19 '23 16:07 leycec