parsedmarc
parsedmarc copied to clipboard
Emails are archived or deleted even if Elasticsearch server is killed during import
Hi,
The situation :
- Massive import of DMARC reports (~50k) on one time, mail after mail
- Elasticsearch was killed (OOM) during the import
Elasticsearch consider each mail as done despite the fact that it can't store results in Elasticsearch.
This is due to :
-
call to "get_dmarc_reports_from_mailbox" on _main in cli.py #952
-
get_dmarc_reports_from_mailbox in init.py #1024
- Iterate over emails on the mailbox (L1097)
- Move them (if delete isn't required) on L1151 - 1163
-
get_dmarc_reports_from_mailbox in init.py #1024
-
call to "process_reports" on _main in cli.py #975
-
process_reports on _main in cli.py #73
- As Elasticsearch server is defined, save date on it (same logic for kafka, s3, ...)
-
process_reports on _main in cli.py #73
I see two options in order to prevent this issue :
- Biggest one (and cleanest) : don't move (or delete) mails in get_dmarc_reports_from_mailbox, but after having results reports saved in elasticsearch (implies to have an UUID pointer to the email)
- Easiest one (and dirtiest) : check for Elasticsearch (or other) server connectivity before moving (or deleting) emails
@seanthegeek have you a preference ?
Bests regards, Anael
Similar to #242
I had a similar problem because my elasticsearch available disk space went below the threshold, so it began rejecting the updates, after the mail messages were moved. That may not be detected by a connectivity check.