SDV icon indicating copy to clipboard operation
SDV copied to clipboard

`DatetimeFormatter`: When `ValueError` occurs, the `pd.to_datetime` can fail due to format miss-match

Open pvk-developer opened this issue 11 months ago • 0 comments

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: 1.21
  • Python version: 3.8 / 3.12
  • Operating System:

Error Description

As seen in this workflow, this error originates through the DatetimeFromatter class that we have in data_processing.

What happened there is that if a ValueError is raised, we try to use pd.to_datetime without considering the already provided datetime format and this is not always accurate as shown in the example below.

Therefore, we should aim to make this more robust by:

  1. Try to cast the data with the provided format
  2. Convert the already parsed datetime to string.
  3. If the 'default conversion' of pandas fails, we should try to apply the format back in a safer way with errors='coerce' and avoid ValueErrors.

Steps to reproduce

series = pd.Series(["31 May 2021", "02 Apr 2021"])
pd.to_datetime(series)

...
ValueError: time data "02 Apr 2021" doesn't match format "%d %B %Y", at position 1. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

pvk-developer avatar May 19 '25 14:05 pvk-developer