pyjanitor
pyjanitor copied to clipboard
[ENH] Incorporate typeguard as a type checker into pyjanitor
I was always bothered by the fact that even though we add type hints to the function arguments, we still need to use that check
function to validate the data.
I created a new package called annotation_validation
that automatically validates input and output data based on the type hints.
Best of all, it's just a decorator that gets added to the functions.
It's essentially a fork of this blog post but I've added type checking for return
values and Union
types.
There's still more to add, but I'm thinking this might be good to start adding to new functions and removing the check
function (since having both leaves more room for error).
Nice stuff, @szuckerman! That'd be great. Do you have a timeline for release and re-integration into pyjanitor?
I was running some benchmarks after I posted this yesterday, and there appears to be a bit too much overhead with how I'm comparing arguments and their types. I have some ideas for caching that will reduce the load.
In any event, I would like to try to integrate this in the next few weeks.
This is fantastic. I love type hinting & in some cases, I'm definitely interested in guaranteeing adherence to them in my other projects, as well!
So, I was doing some work to fix up my package (issues with Python 3.7) and found that someone already made this package anyway 🤷♂
https://github.com/agronholm/typeguard
Haha, my dad once told me, if we have a good idea, someone else probably has already implemented it.
The comfort I took from that statement is that "it's a good idea!" :smile:
I guess I can change this issue title to, "incorporate typeguard to perform type checking"? It's worth a test - the typechecking we have here provides informative error messages. I wonder if we can provide those error messages as well with typeguard?
The error messages from typeguard are such:
For a function that has an argument first
that should be a list
, but instead used a tuple
:
TypeError: type of argument "first" must be list; got tuple instead
For return values:
TypeError: type of the return value must be int; got str instead
(won't matter much since everything's returning a DataFrame
)