telegraf Allow decoupling of main function with runApp to allow custom distributions of Telegraf

Use Case

First of all thank you very much for this wonderful project. Telegraf is one of the most popular tools to metrics ingestion and we use it extensively. However, one limitation is that it doesn't easily extend to custom distributions. For ex, Opentelemetry collector allows organisations to write their own Main function and import the necessary components in their distribution. This allows some components to be developed which have a specific use-case and are private to the org, but they want to leverage the telegraf framework. The major task to incorporate this feature has been done in the following PRs #11700 for CLI #11524 for custom builder. Similar to what Opentelemetry has right now. #11654 for selectively importing plugins.

Expected behavior

Currently, to use telegraf as a service, either we use the custom-builder, or use a telegraf core build. It is not possible to define a main function which replicates what telegraf does. But this allows more flexibility in developing plugins which are private to the org.

Actual behavior

Not possible to write minimal code to create a telegraf binary. Either we have to fork or use custom builder.

Additional info

It would also be good to have a pre run hook which runs before the actual agent. This way we have control over the behaviour of the agent. This behaviour can be defined via flags, which distributions can define apart from the core flags provided.

Sep 21 '22 08:09 neelayu

Hi @neelayu, I'm curious to know more about the purpose of your custom distribution. Telegraf isn't intended to be reused as a golang module at a high level like you describe.

I don't think the project is completely opposed to accepting a refactor like #11852, but I would like to know more about how you intend to use it and why you can't use the existing extensibility features in telegraf. External teams can already contribute plugins or use exec/execd plugins to write private external plugins. You can also write starlark scripts to process metrics. These extensibility features have been sufficient for other users, why is your use case different?

Sep 21 '22 15:09 reimda

Hey @reimda Thanks for going through the proposal. I do understand that telegraf isnt designed to be used as a go module. However, I'll try my best to explain the need for this feature-

We want to invoke Telegraf with configuration generated during runtime, based on certain set of parameters. So we convert a JSON based configuration to TOML, a cfg format which Telegraf requires. And the idea is we do this conversion which will have telegraf as a go module dependency. The before run hook will take care of setting the configuration flag.
If we use telegraf as a dependency, any breaking change can be caught during the build/compilation itself. In case of execd plugins, this would not be caught. Some might prefer a single code base for this. I dont know if I explained it well. But do let me know.
One major reason is that we have been using Opentelemetry collector for quite some time and our build uses otelcol as a go module dependency. Major distributions also include the aws-otel-collector, splunk-otel-collector.

Sep 23 '22 05:09 neelayu

Hi @reimda did you get a chance to go through the points?

Sep 28 '22 14:09 neelayu

A friendly reminder on this. Just want to know what the consensus is on this feature. Thanks!

Oct 12 '22 06:10 neelayu

@neelayu first of all thank you for bringing this up and also thanks for submitting a PR on this topic!

We discussed supporting custom distributions in the team and came to the conclusion that we will not guarantee any public interfaces for the Telegraf agent nor for any other parts with the exception of the plugin interfaces defined. This being said, we nevertheless see the value in organizing the Telegraf code in a way to have a clear set of functions called in a sensible, understandable order like (beside CLI options etc) comprising the steps of:

load configurations files and setup the Telegraf configurations (should be in config)
setup accumulator and plugin interconnect infrastructure (should be in agent)
initializing all configured plugins (should be in agent I think)
start plugins (probably from outputs towards inputs). This would be the main loop. (should be in agent)
stop plugins and make sure all remaining data is flushed (probably from input towards outputs) (should be in agent)
tear-down all plugins (should be in agent)
cleanup

This would also allow you and others to use those functions (e.g. to write your own main). However, I want to emphasize that despite this interface might be general we will not guarantee any public interface there! The main reason is that we probably want to make changes on how things inter-operate and do not want to be tied to switching main versions to change those interna. Does that make sense to you @neelayu?

I will close your PR shortly as it will not provide these clear function mentioned above, but we are willing to accept PR(s) reorganizing the code in the way mentioned. What do you think @neelayu?

Dec 19 '22 11:12 srebhan

Hey @srebhan thanks for going through this proposal. Overall I feel your idea makes sense. And yes, glad we could clear the important part of no guarantee for public interfaces. I do feel there are some changes which could pave the way to use telegraf as a go dependency using the suggestions you mentioned. I feel that should be enough.

I'll work on some refactoring when I get the time.

Dec 20 '22 06:12 neelayu

I'm going to close this for now. We are still happy to see a refactor at a later date.

Nov 07 '23 15:11 powersj

telegraf telegraf copied to clipboard

Allow decoupling of main function with runApp to allow custom distributions of Telegraf

Use Case

Expected behavior

Actual behavior

Additional info

telegraf
telegraf copied to clipboard