dataprep
dataprep copied to clipboard
validate_date() classifies numeric values as dates
Describe the bug
validate_date()
seems to determine that all numbers are dates.
To Reproduce From this dataset, this function gives the following output
def detect_data_types(df, n, t):
ls = []
for col in df.columns:
date_count = validate_date(df[col].dropna().sample(n)).sum()
ls.append((col, date_count))
return pd.DataFrame(ls, columns=["Column Name", "Count of values identified as dates"])
detect_data_types(df, n=1000, t=0.85)
So many numeric columns like latitude and longitude are identified as being dates.
Is this expected behaviour @qidanrui ? Would it be possible to make these numeric values like latitude and longitude validate to False?
Let me check for that.