datatest icon indicating copy to clipboard operation
datatest copied to clipboard

Understanding Pandas validation

Open schlich opened this issue 2 years ago • 1 comments

Hello, apologies if this is the wrong place to ask this question.

I am stumped on how datatest's validation mechanism is passing the following example:

dt.validate(pd.DataFrame(), pd.DataFrame({"A": [1]})

The documentation states:

For validation, DataFrame objects using the default index type are treated as sequences.

Shouldn't I be getting the same result as dt.validate([], [1])? What am I missing?

schlich avatar Jul 12 '22 17:07 schlich

Ah, thanks for posting this. Your confusion is entirely warranted--datatest should be raising an error in this case.

I will look to get a fix pushed out in the next couple of days. There are some logical corner cases that arise when comparing against empty containers (where it's not always obvious what error/difference should be raised) but this is clearly undesirable behavior.

In the short term, if you are trying to use datatest for something and you want an immediate/short-term fix, you can add a preceding check for column names. See below:

import pandas as pd
import datatest as dt

data = pd.DataFrame()
requirement = pd.DataFrame({"A": [1]})

dt.validate(data.columns, requirement.columns)  # <- Add this.
dt.validate(data, requirement)

shawnbrown avatar Jul 13 '22 03:07 shawnbrown