intelmq
intelmq copied to clipboard
Corrupt dump files when interrupted during writing
When a message is dumped by a bot and during this time it receives a KeyboardInterrupt, the write operation will be interrupted causing a corrupt dump file and a data loss. The bigger the file, the higher the probability this will happen and the higher is the data loss.
@wagner-certat validate the following please: Is this situation happens in Parsers, Experts and Outputs? I think it only happens with Collectors which I think is "ok", although if there is a possible fix, lets fix it. :)
There's no difference in dump files handling for the bot types. This is the case for every kind of bot.
There's no difference in dump files handling for the bot types. This is the case for every kind of bot.
Correct me if I'm wrong but if a KeyboardInterrupt happens, the message will still be on queue (check this line and the lines before). This will not happen only on Collectors, right?
Yes, but the dump file will still be corrupted.
Ok, so dump file will be always corrupted on the scenario that you present and data loss will just happens on Collectors which is something that we are already aware and we assume that.
Cool. ;) Thank you for raising this. :)
Data loss happens concurrent with the corrupt file. When the write operation is interrupted, the data not written is lost.
Ok, after talked on IRC with @wagner-certat I understood that the problem here is the fact that all dumped data is loaded when bot needs to dump new bad events which means that every write operation is a full overwrite of dump file, therefore, if KeyboardInterrupt happens during a full write operation, the dump file will only have part of the information. Thank you @wagner-certat once more.