dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

Incorrect result

Open carlosplanchon opened this issue 1 year ago • 1 comments

In [37]: dateparser.parse(date_string="11 de a")
Out[37]: 
datetime.datetime(
    year=2011,
    month=7,
    day=14,
    hour=15,
    minute=57,
    second=17,
    microsecond=334749
)

It happens a lot with strings of the format "\d\d \W\ \W\W", and not with strings longer and shorter than that.

carlosplanchon avatar Jul 14 '22 19:07 carlosplanchon

Here I leave a small list of invalid dates I've seen.

invalid_regexp_list: list[str] = [
    r"\D*\d{1,2} \D{1,2} \D",
    r"\D*\d{1,2} \d{1,2} \D",
    r"\D*\d{1,2} \D{1,2} \D\d",
    r"\D*\d{1,2} \d{1,2} \d\D",
    r"\D*\d{1,2} \D \D{1,2}"
]


def detect_invalid_date(text: str) -> bool:
    for regexp in invalid_regexp_list:
        result = re.findall(regexp, text)

        if len(result) > 0:
            return True

    return False

carlosplanchon avatar Jul 14 '22 19:07 carlosplanchon