pandantic
pandantic copied to clipboard
on failure, optionally provide data on errors (by row & reason)
at the moment data issues can either cause an Exception or for data to be dropped, depending on args used.
Having a way to get data back on where and why errors occurred across a whole DF would be useful
Thanks!
Hi @hottwaj,
Fully agree. A summary and a better error log printing would be a logical next step to implement. I am currently searching for some spare time but this is on top of the to-do list.
Feel free to give it a try and create a PR if you found a nice solution, or wait till I found some time haah.
@wesselhuising this could be a good (optional) feature as opposed to just skipping rows.
We can have a kwarg like errors="skip"/"raise"/"log"
which controls whether to skip or raise. We can also have a logger
kwarg, where the users logger can be passed in.
Interesting take on the logger
kwargs, I think we should combine both functionality as the logger
kwarg part can be considered advanced and for some just the log
would be sufficient.
I gave it a start but I stopped at the point as we need to chose on this particular topic. What would the default behavior be when selecting log
as the errors method. skip
the rows but log them, or raise the error like you would with raise
option?
@wesselhuising I would say the default for validate()
must be raise, as that is the general behavior with validation libraries.
Having skip/log as the optional behavior makes sense. Since it is just two options a boolean argument should suffice.
But for skipping, that would be the future .filter()
method, right? So in that case I would just incorperate a boolean over the string option for the error handling for the validate()
method. I had a verbose option before but I also could bring it back? So;
validate(verbose: bool = False)
and filter(verbose: bool = False)
to tackle this but keep the functionality between skipping and raising separated as we discussed.