intelmq icon indicating copy to clipboard operation
intelmq copied to clipboard

Introduce batching to MISP feed output bot

Open arvchristos opened this issue 11 months ago • 1 comments

Description

Implementation of batch mode for MISP feed output to resolve slow performance when queue has many events. The existing code is actually prone to performance issues:

The following code is being executed for every event in the queue, making the bot extremely slow as events arrive and feed becomes larger:

  • https://github.com/certtools/intelmq/blob/develop/intelmq/bots/outputs/misp/output_feed.py#L116-L121
feed_output = self.current_event.to_feed(with_meta=False)

with self.current_file.open('w') as f: # File opened for every event
     json.dump(feed_output, f)

feed_meta_generator(self.output_dir)     # Metadata updated on every event     

Motivation

We are trying to create feeds based on Alienvault OTX pulses including thousands of IOCs per day. This is basically not possible with the current MISP feed output bot performance.

Fix

With this MR, batched feed creation is introduced. The user can now configure the batch size using the batch_size parameter. Batch functionality is based on the actual internal queue used from the bot.

Benchmark

On an average server, before this improvement a feed of 8k events required several hours to be created while now requires less than 5 minutes (depends on the available resources).

arvchristos avatar Mar 05 '24 18:03 arvchristos

Store events in a separated redis queue. This will move a responsibility for keeping data away from the bot.

And prevent data loss too.

sebix avatar Apr 18 '24 15:04 sebix