intelmq icon indicating copy to clipboard operation
intelmq copied to clipboard

Corrupt dump files when interrupted during writing

Open ghost opened this issue 8 years ago • 7 comments

When a message is dumped by a bot and during this time it receives a KeyboardInterrupt, the write operation will be interrupted causing a corrupt dump file and a data loss. The bigger the file, the higher the probability this will happen and the higher is the data loss.

ghost avatar Jan 30 '17 16:01 ghost

@wagner-certat validate the following please: Is this situation happens in Parsers, Experts and Outputs? I think it only happens with Collectors which I think is "ok", although if there is a possible fix, lets fix it. :)

SYNchroACK avatar Jan 30 '17 16:01 SYNchroACK

There's no difference in dump files handling for the bot types. This is the case for every kind of bot.

ghost avatar Jan 30 '17 16:01 ghost

There's no difference in dump files handling for the bot types. This is the case for every kind of bot.

Correct me if I'm wrong but if a KeyboardInterrupt happens, the message will still be on queue (check this line and the lines before). This will not happen only on Collectors, right?

SYNchroACK avatar Jan 30 '17 16:01 SYNchroACK

Yes, but the dump file will still be corrupted.

ghost avatar Jan 30 '17 16:01 ghost

Ok, so dump file will be always corrupted on the scenario that you present and data loss will just happens on Collectors which is something that we are already aware and we assume that.

Cool. ;) Thank you for raising this. :)

SYNchroACK avatar Jan 30 '17 16:01 SYNchroACK

Data loss happens concurrent with the corrupt file. When the write operation is interrupted, the data not written is lost.

ghost avatar Jan 30 '17 16:01 ghost

Ok, after talked on IRC with @wagner-certat I understood that the problem here is the fact that all dumped data is loaded when bot needs to dump new bad events which means that every write operation is a full overwrite of dump file, therefore, if KeyboardInterrupt happens during a full write operation, the dump file will only have part of the information. Thank you @wagner-certat once more.

SYNchroACK avatar Jan 30 '17 17:01 SYNchroACK