dataprep icon indicating copy to clipboard operation
dataprep copied to clipboard

Date Cleaning (clean_date) falied to clean dates with 'August'

Open DaedalusInMaze opened this issue 11 months ago • 1 comments

Describe the bug When cleaning dates with clean_date module, if the source date contains 'August', the function will not recognize it as a date. All other text months including 'Aug' can be properly identified and cleaned.

To Reproduce

from dataprep.clean import clean_date
import pandas as pd
samp = pd.DataFrame({'date': ['2021 August 21', '2021 Aug 21', '2021 July 21', '2021 Jul 21', '2021 08 21', 'Aug 21 2021']})
clean_date(samp, 'date')

Expected behavior E.g. '2021 August 21' will be cleaned into '2021-08-21 00:00:00'.

Screenshots image

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: N/A
  • Platform: VSCode
  • Platform Version: 1.80.0
  • Python Version: 3.10.11
  • Dataprep Version: 0.4.5

Additional context I noticed that there is already an issue open on FutureWarning: Meta is not valid.

DaedalusInMaze avatar Jul 12 '23 16:07 DaedalusInMaze

The issue might be in tokens = split(date, JUMP) where 'st' is in the JUMP list.

DaedalusInMaze avatar Jul 12 '23 17:07 DaedalusInMaze