dataprep icon indicating copy to clipboard operation
dataprep copied to clipboard

validate_date() classifies numeric values as dates

Open brandonlockhart opened this issue 3 years ago • 1 comments

Describe the bug validate_date() seems to determine that all numbers are dates.

To Reproduce From this dataset, this function gives the following output

def detect_data_types(df, n, t):
    ls = []
    for col in df.columns:
        date_count =  validate_date(df[col].dropna().sample(n)).sum()
        ls.append((col, date_count))
    return pd.DataFrame(ls, columns=["Column Name", "Count of values identified as dates"])
detect_data_types(df, n=1000, t=0.85)

Screen Shot 2021-03-09 at 3 59 15 PM So many numeric columns like latitude and longitude are identified as being dates.

Is this expected behaviour @qidanrui ? Would it be possible to make these numeric values like latitude and longitude validate to False?

brandonlockhart avatar Mar 10 '21 00:03 brandonlockhart

Let me check for that.

qidanrui avatar Mar 16 '21 03:03 qidanrui