bigbang
bigbang copied to clipboard
better error handling for mail collection
- [ ] log errors in a more easily accessible way (including timestamps, HTTP error messages, etc.)
- [ ] automated retry functionality (an isolated network issue shouldn't completely block collection)
- [ ] better logic when re-starting mail collection script (currently just checks for existence of an .mbox file)
w3crawl has some particularly bad error handling on HTTP errors during mail collection, where it can stop the whole script or an entire archive's collection based on a transient network issue. Error catching and retry would make a big difference right away.
Logging has been improved, especially in w3crawl.