bigbang icon indicating copy to clipboard operation
bigbang copied to clipboard

Invalid timestamps prevent Archive initialization

Open brendangreenley opened this issue 9 years ago • 2 comments

archive.py handles null dates by dropping them, but not malformed dates.

I got an uncaught exception

pandas.tslib.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 100-01-31 12:12:25

when trying to get an archive of a python.org mailing list.

archive.py line that threw it: self.data['Date'] = pd.to_datetime(self.data['Date'], utc=True)

Workaround: Caught the exception and set Date to None, which lets entries with malformed date fields be treated the same as entries without a date field (dropped).

Is this issue worth a PR with my fix? Or is the exception preferred so people know the archive has wonky dates?

brendangreenley avatar Nov 03 '16 23:11 brendangreenley

Thanks so much for catching this!

A PR with your fix would be great! Though you raise a good question about what to do with wonky dates.

I think maybe an ideal solution would have a "justworks" argument that, when set to True, catches exceptions and does something reasonable.

sbenthall avatar Nov 04 '16 22:11 sbenthall

I'm also running into this problem. I thought we could fix it by setting the errors='coerce' option (which would create NaT for every instance where the datetime can't be figured out), but I'm struggling a bit with my implementation.

npdoty avatar Feb 10 '18 02:02 npdoty