retina Make it possible to load your own module

Is your feature request related to a problem? Please describe. We are using Retina's packetparser plugin to collect information about packets. We love the fact that under the hood, packets are treated as "events" that are sent to userspace and are annotated with Kubernetes information in enricher. However, the following step of using the forward (or some other) module does not work well for us – we don't want to expose and collect Prometheus metrics and instead want to continue treating packets as "events" and insert them into a ClickHouse instance with a SQL query.

Retina provides wonderful infrastructure to capture and annotate packet data, but the data ingestion pipeline, which is typically the part you need to customise the most, can't be modified without forking Retina, unless I am missing something obvious.

Would you consider making it possible to create your own modules with custom ProcessFlow implementations?

Describe the solution you'd like Make it easy to write a custom module that implements the AdvMetricsInterface interface and is loaded into Retina.

Describe alternatives you've considered N/A

Additional context Please let me know if there is a simpler way to write a custom data exporter. If there isn't, let me know what a good solution would be, and I'd be happy to contribute to the project.

Apr 03 '24 09:04 andreev-io

This is a very interesting ask, thanks for raising it @andreev-io , I am not very familiar with ClickHouse, do you have any documentation for us to understand how does it expect data to be presented as ? The ideal location is to create a new "module" under pkg/module to convert the events/flows into clickhouse objects ?

Apr 03 '24 16:04 vakalapa

Hi @vakalapa. ClickHouse here is an implementation detail. The general idea is that some users of Retina (like us!) might want to do something different with events than export data about them to Prometheus, for example they might want to insert some analytical data about observed events (packets) into a SQL DB.

I agree that creating a new module and implementing AdvMetricsInterface is the best path forward, but I was wondering if the only way to do so is to fork Retina? Is it possible to make Retina "load" custom implementations of modules?

Apr 03 '24 16:04 andreev-io

Is it possible to make Retina "load" custom implementations of modules?

One way i can think of is to open up a socket for any customer implementation to run alongside retina and request for the types of events it needs to process ? Open to any other ideas cc @rbtr

If there are frequent modules or implementation, we can definitely take them as contributions back into retina.

Apr 03 '24 16:04 vakalapa

I'll give you a little bit more context on the problem we are trying to solve, although I think it shouldn't affect the solution that's picked here.

Rather than exporting network data to Prometheus as counters and scraping it into a time series, we are interested in inserting individual "observations" into an analytical database where we can aggregate these observations later on and calculate, for example, p95 packet size, distributions of packets by source-destination pairs (which is hard or impossible to do arbitrarily with Prometheus) or some other groupings that use only a subset or different combinations of "labels" at a time, etc.

Apr 03 '24 16:04 andreev-io

My first thought is that we would do the same thing as Telegraf. They model everything as input/processor/output plugins that generate, operating on, and sink a data stream respectively.

We have a similar high-level architecture already, I think what we're missing here specifically is customizable output plugins.

@andreev-io since you're asking about forking/custom loading, is this kind of output plugin something you would like to contribute upstream such that it could be compiled in, or keep private and load at runtime? The latter is more difficult but I think technically possible with Go plugins. If the former, we just need to make sure we have a good Output plugin interface/spec and plumb in all the config for making Outputs optional, then we can start taking contribution for Output implementations.

Apr 03 '24 16:04 rbtr

@rbtr I like your idea.

I think there are two ways:

Load plugins in Retina like Telegraf does via either: a. Compiling the plugin into the project. b. Loading the plugin at runtime.
Make Retina stream Flows to an external system in some format, and let developers build those data consumers & processors independently.

Upon thinking about it more, our module implementation will have to use some business logic, so it's not suitable for open source. From that perspective, I think (1b) or (2) is what we need.

I saw that you have weekly office hours – I'm going to attend them tomorrow, hopefully we can chat about this live! I would love to contribute the solution to (1b) or (2) to Retina if we agree on the design.

Apr 04 '24 11:04 andreev-io

Dumb question - Retina will also expose flow logs soon, will those be helpful to parse and process them for your post processing/observations?

Apr 04 '24 18:04 neaggarwMS