datefinder
datefinder copied to clipboard
Word ending with -on is consumed with date match
If there is a word that ends with on before a date, it is consumed with the date. Specifically "investigation".
In [3]: s="initial investigation Fri 2020-11-13 19:22 GMT"
In [4]: actual_date_string, indexes, captures = list(datefinder.DateFinder().extract_date_strings(s))[0]
In [5]: actual_date_string
Out[5]: 'on Fri 2020-11-13 19:22 GMT'
In [6]: indexes
Out[6]: (19, 46)
In [7]: s[indexes[0]:indexes[1]]
Out[7]: 'on Fri 2020-11-13 19:22 GMT'
In [8]: s[:indexes[0]]
Out[8]: 'initial investigati'
I want to parse out the date from text and also use what remains.
similarly
datefinder.find_dates('Georgia Larinda Crews, 71, of Folkston, GA passed away at her home Tuesday, September 22, 2020. She was born May, 1949 in Folkston, GA to the late J. Melton and Minnie Lucille Crews. She retired as a paraprofessional with the Charlton County School System. She was a member of the First Baptist Church of Folkston.'
yields: 1971-01-22 00:00:00 2020-09-22 00:00:00 1949-05-22 00:00:00