opentelemetry-collector
opentelemetry-collector copied to clipboard
What is an OpenTelemetry Collector, what is a distribution?
We had a discussion recently around what is an OpenTelemetry Collector and what is a distribution of the Collector. I would like to gather your opinions.
@dyladan proposed that only what the SIG Collector produces can be called an "OpenTelemetry Collector" and that a distribution has to fulfill the following requirements:
- uses the collector framework (upstream not a fork)
- includes only plugins/components which are compatible with the collector framework. they don't need to be in the otel repos, but you should be able to point the upstream collector builder at them
I tend to agree with him, but I'm eager to hear your opinions. The GC might have the right to make the final decision if we can't get an agreement, but I think we can indeed reach a consensus, at least between the GC and the Collector maintainers (core and contrib).
Here was my take from 2020: https://docs.google.com/document/d/1jHOYTRRI91UdyMEfqV7WNPEAxSQKP13b_jPcQX4oe9I/edit?usp=sharing
TL;DR
Other projects (prometheus, kubernetes) have successfully created conformance programs by testing conformant behavior, rather than requiring the use of certain code packages. An example of "conformant behavior" could be:
- Must accept a collector configuration yaml file, which includes a set of components (e.g. otlp receiver/exporter, batch processor, healthcheck extension).
- Must pass basic testbed tests with this configuration.
The easiest way to construct a "conformant" collector distribution would be to simply use collector libraries, or the collector builder, but it wouldn't necessarily require it.
I like the idea of defining conformance to a standard but it's unclear to me what we are suggesting will be the effect of being conformant. In other words, let's say we define what it means to be an "OpenTelemetry Collector", and someone has a product which meets all the requirements. Isn't it still a trademark issue for them to say that their product is an OpenTelemetry Collector?
IANAL but as I understand it, The Linux Foundation has a trademark on the term OpenTelemetry and their trademark guidelines define how the trademark may and may not be used.
e.g. It would be a trademark violation for a company to name their product "Company OpenTelemetry Collector" because the trademark may not be used in a product name. However, it is ok to use the phrase "Company Distribution for OpenTelemetry Collector" because it is a reference to the trademark and does not imply that the trademark is part of the product name.
I don't mean to nitpick but I can't figure out how one would communicate the fact that they officially have an OpenTelemetry Collector without violating the trademark guidelines.
What does this clarification do and how does it help the project? I am unclear on why this is coming up, is this impacting the OpenTelemetry project's ability to graduate within the CNCF?
I like the idea of defining conformance to a standard but it's unclear to me what we are suggesting will be the effect of being conformant.
I think in this case the effect would be that you cannot call yourself a "Collector distribution" without passing X,Y,Z conformance tests.
I think the trademark issue is separate though and has already been enforced in the past.
includes only plugins/components which are compatible with the collector framework. they don't need to be in the otel repos, but you should be able to point the upstream collector builder at them
I'm not sure I fully understand this one. Would this by proxy mean that "a collector distribution" must be built, or be able to be built, with OCB? I think this may be too limiting. Consider this scenario. Contributor X build a new Collector component type. It is ideal for their specific use case, and they don't plan on contributing upstream but they build it on top of the collector framework. OCB does not recognize this component type and thus fails to build it. Would this not qualify as a distribution?
Just linking this other issue here that suggests a distribution should be added to the spec: https://github.com/open-telemetry/opentelemetry-specification/issues/2873
As the issue points out, distribution is already in the official documentation: https://opentelemetry.io/docs/concepts/distributions/
Note the doc linked above also includes a link to the definition of the collector today: https://opentelemetry.io/docs/concepts/components/#collector
The OpenTelemetry Collector is a vendor-agnostic proxy that can receive, process, and export telemetry data. It supports receiving telemetry data in multiple formats (for example, OTLP, Jaeger, Prometheus, as well as many commercial/proprietary tools) and sending data to one or more backends. It also supports processing and filtering telemetry data before it gets exported.
I guess my question would be if Collector SIG disagrees with the definition of distribution that's currently on the website.
One thing that came up today during discussions today at the Operator Sig and also separately in discussions with @Aneurysm9 is command support.
Should collector distributions be required to support both the Collector validate and components command? Do we need to ensure that any future commands are able to be supported by distributions that do not use OCB?
cc: @jaronoff97
My expectation as someone building features on top of the collector is that any collector distribution uses the collector builder or at least can be marshalled in to a struct that matches the collector go framework. Being able to adhere to that would ensure that how we design Kubernetes features will always work for any distribution.
What does this clarification do and how does it help the project?
I think this is a great question to help anchor this discussion.
Here's one scenario that comes to mind.
Consider if (hypothetically) Google offers an OpenTelemetry Collector Distro for GCP that has lots of great 1st party GCP support.
But their distro doesn't include (hypothetically) the Honeycomb Marker Exporter, because they don't want to be on the hook for supporting that exporter.
This situation seems somewhat unavoidable, as I'm not sure we want to force all distros to include all components, both for size and support reasons.
If the OpenTelemetry Collector could support dynamic linking, then users could just drop the Honeycomb Marker Exporter into their GCP distro, and the problem is solved, but it sounds like dynamic linking is a no go because of Go.
So we would need another way to ensure that OpenTelemetry Collector distros can be extended and don't lock users into the distro's ecosystem.
[just for one example, potentially we could say that anything called an OpenTelemetry Collector distro must be built using the OpenTelemetry Collector Builder and that all the distro components must be publicly available so that users can extend the distro themselves]
@trask I don't think your example answers the question, at least for me. And we had an hour-long discussion on the call where we still didn't explicitly enumerate what problems we're trying to address by the discussion. I heard at least two problems, one on the call, another in your answer:
- OTEL Collector maintainers are concerned with getting a lot of user questions in the official Slack related to 3rd-party collector distros, all because they are calling themselves "OTEL collector ..."
- (from your comment) A user who's running a 3rd party distro needs to add another component the collector, what do they do
Some thoughts on (2):
- if the 3rd party distro is fully open source, then user can just build their own flavor that includes additional components
- if the 3rd party distro has closed source parts, there is no way to extend it today, nor in the near future given the Go's ecosystem
- WASM-based plugins are possible but likely won't be efficient enough due to data model complexity that cannot be easily transferred across the Go/WASM boundary without additional transformations.
- The user still has a workaround of adding another oss-only collector in the pipeline (even more inefficient than WASM)
- whatever the solution, the discussion of "what is collector" seems quite tangential to the problem
if the 3rd party distro is fully open source, then user can just build their own flavor that includes additional components
it's not very user friendly and about 100x more painful than the plugin-based ecosystems I've worked with before where I can just upload a pre-built component into my existing system. I guess I was hoping we could get as close to the convenience that other plugin-based ecosystems offer, within the constraints of Golang.
whatever the solution, the discussion of "what is collector" seems quite tangential to the problem
I think the connection is that we have an opportunity to make requirements on something that wants to call itself an OpenTelemetry Collector distro, and so it's our chance to enforce something like this (if we want)
fwiw, the example I gave
[just for one example, potentially we could say that anything called an OpenTelemetry Collector distro must be built using the OpenTelemetry Collector Builder and that all the distro components must be publicly available so that users can extend the distro themselves]
aligns with the definition proposed by @dyladan and @jpkrohling above:
that a distribution has to fulfill the following requirements:
- uses the collector framework (upstream not a fork)
- includes only plugins/components which are compatible with the collector framework. they don't need to be in the otel repos, but you should be able to point the upstream collector builder at them
includes only plugins/components which are compatible with the collector framework.
This ^ already excludes existing distros that use proprietary code. More importantly, it doesn't answer the question which problem a definition like this solves. I see no reason to debate the criteria without deciding why we're doing it. To quote a good book:
- “Would you tell me, please, which way I ought to go from here?”
- “That depends a good deal on where you want to get to.”
- “I don't much care where.”
- “Then it doesn't much matter which way you go.”
I see no reason to debate the criteria without deciding why we're doing it.
I totally agree which is why I tried to provide one possible "why" above. I'm looking forward to seeing what other "whys" people have in mind.
The primary reason I care about a definition here is that users are advised to limit the collector to contain only the components necessary for an environment. In the absence of a dynamic plugin model (which to my knowledge no collector maintainer believes is feasible), we are recommending that users deploy a "collector" that we have not built ourselves. Since we are not recommending a concrete binary, I believe we need to define precisely what we are recommending. Additionally, we expect that as a user's needs evolve they will migrate to another "collector" that contains a different set of components. Therefore, a definition would serve to establish expectations for what stays the same between "collectors" vs what may be different.
I would like to highlight that the issue asks for two definitions, but there appear to be at least three categories of collectors which have been discussed. Very roughly:
- "Official" collectors - those produced by the Collector SIG
- "Custom" collectors - those produced by users following our recommendation to limit components for their environment
- "Distributions" - those published by vendors or organizations
The conversation so far seems to have blurred (2) and (3), and we might explicitly conclude that this is not an important distinction. However, for now, I'm drawing this distinction because the "whys" I've described above specifically apply to (2).
I have two problems that I would like to see resolved.
Problem one: remove confusion about what a Collector is
The first problem is basic confusion about "what a Collector is." Not a Collector distro, but the term Collector itself.
If someone points to a binary and calls it a Collector, just about everyone in the community would assume that the binary is a build of the collector codebase plus some plugins. Even if a binary was described as some kind of "Vendor Specific Collector Distribution," that core assumption would still be there.
That seems a bit obvious, but we're now starting to see projects pop up which don't match this definition. One example is Grafana Alloy. My understanding is that Alloy is basically the pre-existing Grafana agent, plus some additional components that it shares with the Collector codebase. Which is a totally fine thing to be! But when I first came across it, it was described as a "vendor neutral OpenTelemetry Collector distribution." Like everyone else in the community, that description made me think it was something completely different – that it was the Collector codebase plus some Grafana-specific plugins. I was super confused when I discovered that wasn't the case!
Again, no disrespect to Grafana or the Alloy project; it seems like a totally fine project to me. But the naming threw me for a loop. Imagine if CouchDB started calling itself Redis because it shared some Redis code in order to add a feature. That would be really confusing!
I'm sure the Grafana folks are reasonable, and we can just talk to them about it. But I imagine that there may be more instances of this in the future, so it seems prudent that we provide some kind of official definition of a Collector that roughly matches community expectations, in order to avoid confusion. Namely, that a Collector is a build of the collector framework plus some plugins.
Problem two: who do I talk to for technical support?
At the heart of all the various collector distro discussions is the question "who is responsible for helping me with this thing?"
We have users who come into our slack channels asking for technical support. What technical support do we want to give? Who do we point them to if we don't want to give them support? Do we just support the core and contrib builds of the Collector? What if a users makes their own build, but it only contains a subset of plugins in the contrib build? What if they add just one plugin that they wrote themselves? What if a vendor provided the build? What if the vendor build only contains contrib plugins? What if it's the contrib build but their configuration file is absolutely insane? Technical support is really important, and telling someone "no we won't help you" is disappointing. So we need a really clear cut definition for what we are willing to support.
Maybe there are additional problems, but those are the two where I am currently seeing real world issues related to a lack of clear definitions around the Collector.
Problem one: remove confusion about what a Collector is
I don't think this in itself is a problem. Whatever someone calls their binary doesn't concern me unless I have an actual problem to solve and their naming creates confusion preventing me from solving the problem (like coming to OTEL support group when the actual "collector" is something else entirely). So your #2 is an actual problem, but #1 is not, it's more like a possible root cause for #2. But #2 could be caused by other things too - a distro may actually be a "collector" as you want to define it, yet the question is about a custom or even proprietary plugin.
In other words, if #2 is the only problem you want to solve, it needs a policy of what is appropriate scope for support questions. There may be a definition of collector that helps this policy, but doesn't help other problems, such as one in https://github.com/open-telemetry/opentelemetry-collector/issues/8555#issuecomment-2166935956. And there may be other approaches to the policy rather than relying on "what is collector" question. Such as: go talk to your vendor who provided the binary, irrespective of whether it matches any definition of collector or not. I would actually be a strong proponent of that exact rule - vendors have paying customers, they can allocate resources for tech support, instead of putting this burden on oss volunteers in OTEL.
@yurishkuro number one is definitely a problem. We are actively addressing an example of it right now. It is related to number two, but it causes other fundamental confusions.
I agree that for most projects, #1 is not an issue – no one is going to name their project Redis. But perhaps because OpenTelemetry is something of a standard, there seems to be a natural inclination to imply that projects which process OTLP are part of OpenTelemetry even when they are maintained outside of the project, with the Collector being the main target. I don't think that defining what a Collector is needs to be difficult or complicated, but we should write it down anyways. We have other problems, their solutions don't need to be related to the sign on the wall we need to put up declaring that the term Collector only refers to this codebase.
Thank you all for the renewed interest in defining Collector and Collector distributions. I watched the recording from last Thursday and spoke to several of you on Slack (GC and Collector leads). Here’s a summary of the situation as I understand it.
We already have a few definitions in place, such as:
- Distribution: "A distribution is a customized version of an OpenTelemetry component. A distribution is a wrapper around an upstream OpenTelemetry repository with some customizations."
- Collector: "The OpenTelemetry Collector is a vendor-agnostic proxy that can receive, process, and export telemetry data. It supports receiving telemetry data in multiple formats (for example, OTLP, Jaeger, Prometheus, as well as many commercial/proprietary tools) and sending data to one or more backends. It also supports processing and filtering telemetry data before it gets exported."
Commercial vendors are being asked to support the "OTel Collector" by their customers, as evidenced by the number of commercial vendors listed as having a distribution of the Collector:
- AWS Distro for OpenTelemetry (ADOT)
- Grafana Alloy
- Liatrio Distribution of the OpenTelemetry Collector
- observIQ BindPlane Agent
- RedHat RHOSDT OpenTelemetry Collector Distribution
- Splunk Distribution of OpenTelemetry Collector
- Sumo Logic Distribution for OpenTelemetry Collector
Each vendor has a different approach to meeting this demand. Some assist customers using a curated list of upstream components, others offer support (with SLAs) for their official binaries with vetted upstream components, and others provide extra features at different levels. These approaches are categorized on the distribution definition page as "Pure," "Plus," and "Minus."
However, not all of these approaches resonate equally within the GC and with Collector maintainers: we accept some approaches as distributions but not others. We can't pinpoint why they are different, making it harder for vendors to comply with the (non-existent) requirements to be called a distribution. The GC has politely asked one of these vendors to stop calling itself a Collector, without providing a clear path forward for the project to regain the right to be called a distribution. Lack of knowledge about these projects adds to the confusion. For instance, I have seen inaccurate claims about ADOT and Alloy.
@atoulme, @bogdandrutu, and @yurishkuro have questioned the actual problem we are aiming to solve. While their question might seem odd, there wasn’t a clear articulation of the problem: we feel that something is off but can't pinpoint why we don't want certain projects to be called a distribution of the Collector. One argument by @djaglowski was well-received: we want users to have a consistent experience and be able to reuse their knowledge when switching between "flavors" of the Collector, whether custom-built, vendor-built, or community-built.
I have also heard a few other arguments, which I'll address here:
- @codeboten expressed concern that users might come to our GitHub repositories and Slack channels with questions about downstream (custom or vendor) distributions, causing distraction to the already overloaded maintainers. A counter-argument (sorry, I forgot by whom) was made that we want to encourage users (and vendors) to stay close to us. My personal opinion is that we are handling this well for now: when we see an issue related to a specific distribution, we typically tag specific people from the vendor on the GitHub issue or Slack thread. It's in their best interest to handle those questions.
- @trask mentioned a hypothetical situation where a vendor requires using a custom exporter to send data to them, while their cloud provider requires a custom processor to enrich telemetry with cloud metadata (cluster, region, etc.). I haven't seen this situation before, and I think most relevant vendors (minus one or two) can ingest OTLP natively. As long as distributions provide an OTLP exporter and vendors can ingest OTLP, users won't face this problem. I recognize a general concern about lock-in by offering components that can't be used elsewhere.
To me, it's clear that we need an objective set of rules in addition to our existing subjective definitions, so the ecosystem can thrive with options for our users while retaining their ability to reuse their knowledge and switch between flavors without getting locked-in. If we can agree on this need, here’s what I propose as an initial draft, with the promise to develop it further elsewhere:
- A build of the Collector is what can be obtained by the result of the OpenTelemetry Collector Builder (ocb) or that can be reproduced with the builder. This is typically the result of end-users picking which components they want to use in production. Builds of the Collector can include proprietary components, but those components should be reusable in other builds (like end-user custom builds).
- A distribution is a build of the Collector done by the Collector SIG with a set of components available from our repositories (this one and contrib).
- The OpenTelemetry Collector (following @codeboten's definition from the website) is one specific distribution, produced by the Collector SIG.
- An OpenTelemetry Collector compatible solution is a binary that "acts like a collector and walks like a collector." For that, we’d define a set of tests that such a binary needs to pass (certification program of sorts), which may include:
- Ability to use the same configuration format
- Ability to replace the OpenTelemetry Collector at runtime (e.g., by changing the
imageproperty of the Collector's CR on the Operator) - Ability to be managed by OpAMP, once ready
- Ability to be observed by a specific set of metrics
Thanks @jpkrohling that's a great layout. My only suggestion is that I think Collector Build and Collector Distro can be combined. Anything that can be reproduced by the builder can be called a Distro, regardless of who issued it.
In my previous message, I should have stressed more that we didn't have a consensus on whether we had a problem to solve. Before addressing why I think we need a build and a distribution, I'd like to take a step back and have a consensus.
Community, Collector leads, TC, GC: please vote on this issue. The options are:
❤️ No problem to solve at the moment. Let the ecosystem use our subjective definitions (status quo) 👍🏽 We have a problem with the subjective definitions and need a concrete set of rules
Note that you are NOT voting on my draft proposal.
Let me try one last time. You cannot solve a "problem" of "what is collector" without deciding why, i.e. what success criteria you want to meet by "solving" it. The poll above provides exactly zero answers to that question.
Not sure how helpful this is, but this is my take from working with several hundred customers adopting OTel:
- There is a lot of variation on what collectors they use. Contrib versions galore, base collector, ObservIQ distro, Honeycomb's collector config for metrics compression that creates a distro, ADOT distro, AWS distro (these are different?), K8s distro, customer-specific distros, something called a "Local OpenTelemetry Collector binary". Contrib is the biggest category here.
- The HNY-specific thing can be a pain for people since it requires rebuilding a collector binary and deploying when you need to make updates, which customers forget about
- A ton of customers configure and "forget" their distro. It works, they never touch it, so it's often months or years out of date
- Occasionally someone gets confused because they need to use the
transformprocessoror something useful like that, but their distro (usually the base distro) doesn't support it - ADOT distro sometimes brings some pains but it's very much seen as a problem with lambda
- Some customers have asked for support, often expecting a distro (we offer support without use of a distro)
- Alloy in particular has bubbled up as "interesting" to some folks and they saw it as a grafana-specific thing
So I guess my experience is that there isn't a terrible problem here to resolve, but there is quite a bit of variation in what people use, and that sometimes leads to confusion or a bad experience depending on what they're using.
I see here echoes of what it means to adopt OTel. If you propose an alternative API, but still emit semantic conventions and OTLP data under the hood, is that OTel? I'd say yes. Is your binary, Acme Corp. Collector, capable of accepting and emitting OTLP, and also uses the batchprocessor with some different defaults under the hood? I'd call that a collector as well.
@yurishkuro, please bear with us. Your input has been valuable and I think we are now in a better position because of your questions. I'll try again, starting with what I see as the problems we are trying to solve:
- Bad user experience (or confusion): Users struggle to understand the differences between various distributions and how they relate to the upstream OpenTelemetry project, as evidenced by the comments from @tedsuo and @cartermp, among others.
- Vendor uncertainty: vendors are unsure about the requirements and guidelines for their distributions to be recognized officially, leading to potential misalignments and disputes within the community, as evidenced by the current request from the GC for a distribution to not be called as such anymore, without telling them exactly what's wrong.
If we define we want to work on those problems, here are the goals for me:
- Provide clarity: establish clear, objective criteria for what constitutes an OpenTelemetry Collector distribution so that both vendors and the community know what is and what isn't a distribution.
- Consistent user experience: by establishing objective criteria, we ensure that users can have a consistent experience across different distributions if they stick to the aspects we establish, enabling users to switch between distributions without relearning or facing incompatibilities, while at the same time being able to use distribution-specific components or features.
I think the simplest way to conceptualize the 'problem' is that the only thing that the project defines as hard requirements for 'what is an OpenTelemetry
Ultimately, we need to be able to provide some guarantees to both of these groups -- to users, we need to be able to have clear guidance for questions like:
- If I write a custom receiver, will that work with other collectors?
- Are configurations portable between different collectors?
- Do all collectors support a single management protocol?
To builders, we need guidance around:
- How to name things to avoid user confusion
- Implementation guidance on creating consistent experiences across variations e.g., if I was to rewrite the collector in rust, what parts should I preserve? How different can I be from upstream before a collector isn't a collector?
@jpkrohling
Provide clarity: establish clear, objective criteria for what constitutes an OpenTelemetry Collector distribution so that both vendors and the community know what is and what isn't a distribution.
Don't you see that this is a pure tautology? "We want to know because we want to know". Any definition will match that. E.g. the following definition is clear and objective, and completely besides the point as it does not address the unspoken problems:
- collector is a desk
- collector distribution is a desk that is shipped to your home disassembled
Consistent user experience: by establishing objective criteria, we ensure that users can have a consistent experience across different distributions if they stick to the aspects we establish, enabling users to switch between distributions without relearning or facing incompatibilities, while at the same time being able to use distribution-specific components or features.
This is getting closer to the issue, but it's very hand-wavy. @austinlparker 's comment https://github.com/open-telemetry/opentelemetry-collector/issues/8555#issuecomment-2181033886 is more concrete. Basically, we can approach this as a product requirement spec. Try to phrase everything as a use case:
"as a {user role} I want to {perform an action} so that I can {achieve an outcome}".
For example, with one of Austin's bullet points:
- Are configurations portable between different collectors?
- Rewrite: as an end user I want to take my collector config that I use with distro X and use it with distro Y so that I have the same behavior.
Phrased like this, an immediate question from me - is that what we actually want? How is that even possible? It means that the two distros are 100% functionally equivalent (at least on the features I already used with distro X), which defeats the purpose of distros in the first place. Ability to swap implementations is a nice theoretical goal, but there are other goals users may have, like I don't want to run binaries 100s of MBs in size bundling every possible feature.
So rather than keep debating completely arbitrary definitions of collector, let's first
- list what use cases we want to satisfy (aka "problems to solve"),
- whether we indeed agree that we want to satisfy them,
- and whether it's even possible to satisfy many of them at once (as a compromise).
Doing so will implicitly inform the definition of the collector, based on actual problems / goals / user needs, not based on a tautological definition of a problem.
@yurishkuro There is an immediate need for the collector, as a SIG, to define what the requirements of another piece of software calling itself an 'OpenTelemetry Collector' must align with. This is, as you said, a product requirement. I stated my rationale above, but I would like to expand on it with the bigger issue here.
As OpenTelemetry continues to mature and graduates, we (the GC and project leadership more generally) will need to create requirements around certification and compatibility. This is both easy, and hard. For instance, it is relatively easy to set a requirement around something like OTLP. If you write OTLP, then you must write valid OTLP to any compliant OTLP receiver. It is also somewhat easy to say 'Supports OpenTelemetry API' by ensuring that you can get the active span from context and modify it, etc.
The collector, however, is much more difficult to quantify by these standards. I agree, in principle, that it might not be desirable for non-specced config files to be portable. I would generally agree that a receiver written for upstream may not necessarily work with other implementations. With that said, what is the distinction that we are going to use? You can hopefully understand my reluctance to say "Ok, well, you can just call anything that receives OTLP a Collector" because that could be very confusing for users, especially as management tools proliferate. Similarly, it does not benefit users to remove one source of lock-in (the API/SDK) then replace it with another (the pipeline/collector layer).
I would honestly be fine saying 'there is only one thing called an OpenTelemetry Collector, and it is anything that is built with upstream ocb'. Everyone else in the ecosystem can be 'OTLP compatible' or whatever other words we come up with.
edit: By 'non-specced' config files above, I mean configuration files that do not align with a published specification (eg, the upcoming file-based config options)
Just to be crystal clear -- I think an entirely acceptable outcome of this is stating the following:
- An OpenTelemetry Collector is a specific piece of software that is built, maintained, and published by the OpenTelemetry project.
- An OpenTelemetry Collector is also any distribution of the Collector that is built using the ocb tool.
- No other software may call itself an 'OpenTelemetry Collector'
There is an immediate need for the collector, as a SIG, to define what the requirements of another piece of software calling itself an 'OpenTelemetry Collector' must align with. This is, as you said, a product requirement. I stated my rationale above, but I would like to expand on it with the bigger issue here.
@austinlparker I am sure you're familiar with the monitoring principle "don't alert on root causes, alert on symptoms". The motivation is that there may be many different root causes of an issue, but if it does not affect user-facing behavior it's not worthy of an alert, and vice versa, if the user experience is affected you should be alerted regardless of the root cause. In your paragraph, "what to call a collector" is a (possible) "root cause". I am interested in the "symptoms".
If my previous quote (https://github.com/open-telemetry/opentelemetry-collector/issues/8555#issuecomment-2167135929) didn't hit the spot, here's another one:
“What's in a name? That which we call a rose by any other name would smell just as sweet.”
@yurishkuro I have three concrete issues that can be resolved with definitions.
The first is support. I would like us to only handle technical support requests for the code that we own. We issue two binaries, so we support those binaries. Anything beyond that we don't really want to support.
The second is end user confusion. I continuously get questions from end users about "what is a Collector Distro?" Many users think that these are forks of the project. For example, I have heard people complain more than once that OTel is a fractured project because every vendor has forked the Collector. Defining what a "Collector Distro" is would help a lot with these misconceptions. I am really tired of answering these questions and putting the same misconceptions to rest over and over again. And yes, I often get asked the question "where is this all defined?" when I explain this.
There is now even further confusion, as projects are now appearing that contain some Collector functionality, but also contain significant additional functionality that has nothing to do with the Collector. If these projects primarily refer to themselves as "Collectors" or "Collector Distros," then there really is some kind of fracturing happening, as those projects contain functionality that could not be added to other Collectors – you must use that specific non-collector codebase in order to access those features. The same goes for Collector binaries that contain private plugins – it's a fork because you are now completely dependent on this third party organization for this functionality. If projects like these can be considered Collector Distros, that the term would be meaningless and cannot solve the first problem. So I want another term to describe these projects.
@jpkrohling's proposal is very close to what I want, as it differentiates between:
- Collectors - binaries we are willing to provide support for, AKA the Collector builds that we publish.
- Collector Distros - custom builds of the Collector, which are not forks of the project as they just contain publicly available plugins and could be recreated using the Collector build tool. We do not provide support for Collector Distros, go to the organization that created the distro for support.
- Collector Compatible projects - projects which contain Collector functionality, but also include private plugins or code that has nothing to do with the Collector architecture and cannot be recreated with the Collector build tool. These are essentially forks of the Collector, or at any rate are completely different projects. For this reason, we don't want them confused with Collector Distros, but we do want to acknowledge that they work with collector configs and thus have at least some expected behavior.
Those definitions would go a long way to resolve the end user confusions I have encountered to-date around the Collector.