etk icon indicating copy to clipboard operation
etk copied to clipboard

DateExtractor raises OSError

Open jcarlosroldan opened this issue 7 years ago • 2 comments

Date extractions that contains 69/44 raise an OSError:

from etk.extractors.date_extractor import DateExtractor

>>> de = DateExtractor()
>>> de.extract("69/44")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\JC\AppData\Local\Programs\Python\Python36-32\lib\site-packages\etk\extractors\date_
extractor.py", line 280, in extract
    ans = self._remove_overlapped_date_str(results)
  File "C:\Users\JC\AppData\Local\Programs\Python\Python36-32\lib\site-packages\etk\extractors\date_
extractor.py", line 353, in _remove_overlapped_date_str
    parsed_date = self._parse_date(cur_max)
  File "C:\Users\JC\AppData\Local\Programs\Python\Python36-32\lib\site-packages\etk\extractors\date_
extractor.py", line 474, in _parse_date
    date = self._post_process_date(date)
  File "C:\Users\JC\AppData\Local\Programs\Python\Python36-32\lib\site-packages\etk\extractors\date_
extractor.py", line 489, in _post_process_date
    date = date.astimezone(self._default_tz)
OSError: [Errno 22] Invalid argument

I'm using ETK 2.1.0 installed using pip on Windows 7 x64.

jcarlosroldan avatar Oct 10 '18 00:10 jcarlosroldan

It also raises an OSError in the following dates: ['26 June 1925', '3 September 1927', '2 November 1925', '24 July 1926', '25 August 1928', '21 May 1928', '7 November 1929', 'Stricken August 1957', 'June 1929', '12 April 1930', '8 September 1930', '29 December 1930', 'July 1936', '20 June 1931', 'July 1930', '26 October 1933', '28 June 1932', 'September 1936', '24 December 1931', 'September 1936', '28 March 1933', '24 July 1933', '19 June 1947', '21 December 1950', '1 May 1946']

jcarlosroldan avatar Nov 05 '18 21:11 jcarlosroldan

Hi I think it is the Python datetime bug on Windows: https://bugs.python.org/issue29097 . date between 1970 through 2038 should be okay. And I have fixed some false positives, things like "69/44" will not be extracted as date in future version ETK.

Lituta avatar Nov 05 '18 23:11 Lituta