datatest
datatest copied to clipboard
Understanding Pandas validation
Hello, apologies if this is the wrong place to ask this question.
I am stumped on how datatest's validation mechanism is passing the following example:
dt.validate(pd.DataFrame(), pd.DataFrame({"A": [1]})
The documentation states:
For validation, DataFrame objects using the default index type are treated as sequences.
Shouldn't I be getting the same result as dt.validate([], [1])
? What am I missing?
Ah, thanks for posting this. Your confusion is entirely warranted--datatest
should be raising an error in this case.
I will look to get a fix pushed out in the next couple of days. There are some logical corner cases that arise when comparing against empty containers (where it's not always obvious what error/difference should be raised) but this is clearly undesirable behavior.
In the short term, if you are trying to use datatest for something and you want an immediate/short-term fix, you can add a preceding check for column names. See below:
import pandas as pd
import datatest as dt
data = pd.DataFrame()
requirement = pd.DataFrame({"A": [1]})
dt.validate(data.columns, requirement.columns) # <- Add this.
dt.validate(data, requirement)