DataProfiler
DataProfiler copied to clipboard
Datetime Profiler cannot current detect dates with days which have suffixes
General Information:
- Library version: v0.7.6
Describe the bug:
While this date can be detected: Nov 15, 2013
adding a suffix to the day does not allow it to be predicted, e.g. Nov 15th, 2013
.
To Reproduce:
DateTimeColumn._get_datetime_profile(pd.Series(['Nov 15th, 2013']))
# output:
# {
# 'date_formats': [],
# 'min': None,
# 'max': None,
# 'min_obj': datetime.datetime(9999, 12, 31, 23, 59, 59, 999999),
# 'max_obj': datetime.datetime(1, 1, 1, 0, 0),
# 'match_count': 0
# }
Expected behavior:
DateTimeColumn._get_datetime_profile(pd.Series(['Nov 15th, 2013']))
# output:
# {
# 'date_formats': ['%b %d, %Y'], # something in this should indicate a suffix, currently doesn't.
# 'min': 'Nov 15th, 2013',
# 'max': 'Nov 15th, 2013',
# 'min_obj': Timestamp('2013-11-15 00:00:00'),
# 'max_obj': Timestamp('2013-11-15 00:00:00'),
# 'match_count': 1
# }
probably a short term fix for removing th
, nd
, rd
, and st
so they can at least be recognized.