Rework extraction process

Open regulartim opened this issue 1 week ago • 0 comments

The process of extracting data from T-Pot and writing into our database is one of the most important parts of GreedyBear. However it has some problems:

it is not very well testable: many functions and methods directly depend on the presence of elastic stack and/or of the the GreedyBear database which makes them hard to test when these data sources are missing
classes like ExtractAttacks have many different responsibilities and should be split up into separate service classes
the special treatment of Log4j and Cowrie is deeply baked into the process (although Log4j is not that relevant anymore)

I am currently working on an improved process following some best practices:

repository pattern: repository pattern handle data access without containing any processing logic
single responsibility: every class in the process has one clear and recognizable responsibility
dependency injection: dependencies are injected through constructors which makes testing much easier
strategy pattern: makes it easier to add new "special treatment" for honeypots

I will open a PR soon which contains the most profound changes to the logic. After that code is merged, I would also like to streamline the Cowrie extraction process and add end to end pipeline tests. But I'll open separate issues for that.

Dec 17 '25 12:12 regulartim