opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

[RFC] OTel collector modules

Open jsoriano opened this issue 1 year ago • 3 comments

This change adds a new RFC that attempts to summarize the previous discussions around templates and the options that at the moment look more active.

The main output I would expect from the acceptance of this RFC is to reach an agreement on the overall implementation approach. There are still some open questions mentioned that we might let open by now.

So far there have been several attempts to introduce some kind of configuration templating in the OTel collector, and there seems to be some controversy about the steps forward. With this RFC we are proposing to limit the scope to the use case of sharing reusable configurations to collect signals from well-known services and applications. And with this scope in mind, consider use cases where this feature could be useful, as with autodiscovery. We also propose to call this feature "modules".

I skip in-depth technical details, but reference POCs where some of the discussed approaches are implemented.

I would like to thank @djaglowski for all his efforts driving previous discussions and experimentation. And @evan-bradley, @mx-psi, @rogercoll and others for their comments. It would be great if you could take a look to this RFC.

Related issues and PRs

  • https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/36116
  • https://github.com/open-telemetry/opentelemetry-collector/issues/8372
  • https://github.com/open-telemetry/opentelemetry-collector/pull/8507
  • https://github.com/open-telemetry/opentelemetry-collector/issues/8940
  • https://github.com/elastic/opentelemetry-collector-components/pull/96

cc @rogercoll @ChrsMark @andrzej-stencel @mlunadia

jsoriano avatar Nov 08 '24 18:11 jsoriano

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.65%. Comparing base (d39dd7a) to head (a199ba8). Report is 617 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11631      +/-   ##
==========================================
+ Coverage   91.58%   92.65%   +1.07%     
==========================================
  Files         440      489      +49     
  Lines       23763    29236    +5473     
==========================================
+ Hits        21764    27090    +5326     
- Misses       1627     1737     +110     
- Partials      372      409      +37     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 08 '24 18:11 codecov[bot]

@mx-psi thanks for your comment, and thanks too to @rogercoll for bringing the discussion to KubeCon.

  1. Why not using Helm/Kustomize? Other projects such as Kubernetes don't have an explicit builtin templating mechanism but rather have tooling on top of the underlying configuration. We need to justify why we need to add this.

Helm is limited to Kubernetes and we would like this feature to work on any environment where the collector can be deployed. Kustomize can be less limited to kubernetes but doesn't seem so straightforward to use for users, nor provides a so clear abstraction for sharing OTel configurations.

For the Kubernetes use case, I see Helm and Kustomize as tools that could be combined with modules/templates more than an alternative. For example perhaps Helm could be used to install modules/templates or Kustomize to manage configurations that include modules/templates. In other environments modules could be also used in the context of other config management tools.

I will try to explore these possible overlaps and combinations during next week and provide more context in the RFC.

Other projects such as Kubernetes don't have an explicit builtin templating mechanism but rather have tooling on top of the underlying configuration.

Agree, this is why we'd like to frame this more as an abstraction layer for reusing complex configurations than a solution for general templating. This can be seen as a way to implement receivers or processors without writing Go code.

Imagine for example an Couchbase expert with a finely tuned configuration who is able to share it by doing some small tweaks on it for parametrization, and a Couchbase user being able to easily reuse this configuration in a way that is consistent with basic collector configurations.

How could we extend this in the future for other kinds of components? Extensions, service::telemetry, and exporters may be places where we also want to have shareable configuration snippets. How are we going to fulfill those use cases if people ask for them in the future? An initial implementation needs not add support for those, but the design needs to take them into account.

We haven't found so clear use cases for other components, but this could be definitely added in the future. For example for extensions or exporters I imagine that we could also have something like extensiontemplate or exportertemplate components that follow similar logic to the ones we are proposing by now. I will add a mention to that in the "Future possibilities" section.

Other use cases where general templating is wanted will be probably better covered with config management tools. I would let this out of the scope of this feature.

3. Where does the current configuration resolution mechanism fall short and why? We have some functionality to reuse snippets of code via the concept of confmap.Providers. This seems not be enough, but we should justify why (I think the two main pieces missing are parameters and merging slices) and why it is not possible to extend this mechanism to fulfill new use cases (I am not convinced it is not possible). If we don't go with confmap.Providers, how will this proposal interact with them? What about confmap.Converters?

I will be more explicit about this in the RFC, but as a summary:

  • I agree it would be possible to provide similar features with config resolution mechanisms, but would require users to have a good understanding of it. They need to know what files to provide, with what format and in what order. We would be missing the goals of providing a clear abstraction for sharing complex configurations and of lowering the entry barrier for common use cases.
  • Config providers would couple templating with the source of the templates, it may be complex or not possible to implement different sources for templates. It would also require non-basic understanding of how configuration works. And it might not work well with the receiver creator for autodiscovery use cases.
  • Converters would also work, actually @djaglowski has a quite nice POC, but there are some architectural problems and some limitations as discussed in comments in the POC and mentioned in the RFC.

If we don't go with confmap.Providers, how will this proposal interact with them?

As the proposed components will work as usual components, they can be included in configurations provided from any provider. I would limit very much the inclusion of references to providers in templates because this can open the gates to security problems.

This could be a point in favor of using a config resolver as the templating mechanism, or part of it, we have more control on the receivers that are allowed in templates. (We use a config resolver for variable substitution in our POC).

4. Can we do a PoC that happens entirely within a receivertemplate component? This could live outside of core and could help us validate the ideas spoused here.

There is a POC in https://github.com/elastic/opentelemetry-collector-components/pull/96, we can polish it a bit more and merge it at least in our repo so it is available for further testing and validation.

A more real-world example of a template for this POC can be found in https://github.com/elastic/integrations/pull/11253/files#diff-26a3b4eab2c9b845b76591d786a09cd833ac1527288dd2e0c4e2c457fc853521 We will add more examples to the POC and to the RFC.

I am going to request changes until we answer those questions in text. I know it is a high bar, but our configuration resolution system is quite complex as it is, so we need to figure this out in a way that fits this existing system and justifies the additional maintenance burden. I am happy to help in the discussion of any of these questions :)

:+1: thanks for the help with the discussion!

jsoriano avatar Nov 21 '24 11:11 jsoriano

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Dec 06 '24 03:12 github-actions[bot]

@jsoriano Can you request another review from me (with the :repeat: button) once this is ready for another look?

mx-psi avatar Dec 20 '24 11:12 mx-psi

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jan 04 '25 03:01 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jan 22 '25 03:01 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Feb 06 '25 03:02 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Feb 20 '25 03:02 github-actions[bot]

@jsoriano Happy to reopen this if you come back to it at some point :)

mx-psi avatar Feb 20 '25 11:02 mx-psi

Hey, yes, I expect to come back to this soon :slightly_smiling_face:

jsoriano avatar Feb 20 '25 12:02 jsoriano

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Mar 07 '25 03:03 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Mar 21 '25 03:03 github-actions[bot]