opentelemetry-collector-releases
opentelemetry-collector-releases copied to clipboard
Create rules for supported distributions
Description
This Issue is a continuation of https://github.com/open-telemetry/oteps/pull/229.
The Collector SIG has had continuous discussion around the distributions we support, if we should support them, if we should support more, and what components should be included in the existing distributions. This is causing the SIG time and effort, but also the lack of clear rules could confuse users who are looking to use or contribute to a distribution. The goal off this issue will be to:
- Determine the criteria for when the community will support a distribution
- If Contrib fits those criteria, determine the rules for what components are included in the Contrib Distribution
- If Core fits those criteria, determine the rules for what components are included in the Core Distribution
We may need to also determine how the Release pipeline will change to be able to scale to more distros. At the moment it already struggles with 2.
Current criteria based on the discussion in this issue:
- To honor commitments made by OpenTelemetry to other Open Source projects the Collector SIG should support at least 1 distribution that meets those commitments. (IDK what those commitments are, need input from others to more explicitly list out the impact of those commitments.)
- Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap.
- Distributions supported by the Collector SIG should meet general needs and not be too niche.
- Distributions supported by the Collector SIG should only target the needs of the OpenTelemetry project.
- Distributions supported by the Collector SIG are not required to be production ready and may be focused on development and proof of concept use cases. The distribution should clearly indicate whether or not the Collector SIG considers it to be production ready.
- Distributions supported by the Collector SIG must only include components from the
opentelemetry-collectorandopentelemetry-collector-contribrepositories. - Distributions supported by the Collector SIG should have a clearly defined list of criteria for which components are included.
- Distributions supported by the Collector must include the following assets except where the specific purpose of the distribution is naturally associated with a subset of these assets. In such cases, it should be clearly stated which assets are skipped and why. Additional asset may be included if the distro desires:
- Binaries for linux_amd64, linux_arm64, windows_amd64 and darwin_arm64
- linux_amd64 and linux_arm64 container images
- Packages to be used with Linux distributions (apk, RPM, deb), Mac OS (brew) for each distributed binary.
We may need to also determine how the Release pipeline will change to be able to scale to more distros. At the moment it already struggles with 2.
One idea I haven't explored but comes to mind is to split the goreleaser workflow per distribution. I've not tested this, but it should be doable
Being able to release each distribution independently sounds like a good idea to be able to scale to more distributions.
Can you clarify what's to be understood by "support"? What's expected from the community if we say we support a distribution?
I think support implies all the duties covered by the Approver and Maintainer role requirements. In addition (bc I don't see it explicitly listed in the membership responsibilities), support means that the Collector SIG is the owner and maintainer of the binaries/images of the different Collector distributions and is responsible for the pipeline that produces those artifacts.
Determine the criteria for when the community will support a distribution
Here is my initial attempt at an answering:
- To honor commitments made by OpenTelemetry to other Open Source projects the Collector SIG should support at least 1 distribution that meets those commitments. (IDK what those commitments are, need input from others to more explicitly list out the impact of those commitments.)
- Distributions supported by the Collector SIG should make using the Collector easier.
- Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap.
- Distributions supported by the Collector SIG should meet general needs and not be too niche.
- Distributions supported by the Collector SIG must only include components from the
opentelemetry-collectorandopentelemetry-collector-contribrepositories. - Distributions supported by the Collector SIG should have a clearly defined list of criteria for which components are included.
- Distributions supported by the Collector must include:
- Binaries for
linux_386,linux_amd64,linux_arm64,linux_ppc64le,windows_386,windows_amd64,windows_ppc64le,darwin_amd64,darwin_arm64, anddarwin_ppc64le. (Looking at the release assets I don't actually think we provide this today despite claiming we do in our README. linux_386,linux_amd64,linux_arm64, and `linux_ppc64le container images- Packages to be used with Linux distributions (apk, RPM, deb), Mac OS (brew) for each distributed binary.
- Binaries for
IDK what those commitments are
IIRC, those were listed in the original blog post that announced OTel, and included only Prometheus, Jaeger, and Zipkin, apart from OpenTracing and OpenCensus.
Other than that, your proposal looks good to me.
One thing I wouldn't mind some more clarification on is:
Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap.
Does that mean the collector SIG sets the direction for another's SIG (say, K8s and Lambda) distribution? Or conversely, what should be the expected level of involvement from those SIGs if we manage the distribution for them?
I know it isn't explicitly said, but is the intent that the supported distributions are intended the fulfil needs of the open telemetry project only? Meaning we are not managing distributions for vendors or other ecosystems?
Does that mean the collector SIG sets the direction for another's SIG
I don't think it has to. The Collector SIG should be free to decide which specific purposes it sees as valuable enough to warrant supporting a distro. If that corresponds with the needs of another OpenTelemetry SIG that is a bonus, but it shouldn't be a requirement.
what should be the expected level of involvement from those SIGs if we manage the distribution for them?
I don't think we should view it as managing distributions for another SIG. We should only support a distribution if we see value in it for our users. If another SIG also benefits from it and wants to help support that is a bonus, but we should not rely on others to support these distributions.
I know it isn't explicitly said, but is the intent that the supported distributions are intended the fulfill needs of the open telemetry project only? Meaning we are not managing distributions for vendors or other ecosystems?
Good point, we should explicitly call this out. How about Distributions supported by the Collector SIG should only target the needs of the OpenTelemetry project.
I don't think we should view it as managing distributions for another SIG
They are OpenTelemetry users, no matter which SIG they consume "first". Given that our distributions will have only components from either core or contrib, the support will end up on us anyway, no matter the SIG that proposed the distribution. Perhaps we could provide guidance to other SIGs into which components they can rely on?
@jpkrohling I agree. My comment's intent is to say that we should not treat other OTel SIGs differently from other users. I don't think we need to create any SIG-specific guidance either; any guidance we give should apply to SIGs and end-users alike.
To touch on points 2 and 3 in the issue for a bit:
Based on the discussed criteria so far, the Core distribution would no longer be supported. I believe it fails to pass these checks:
Distributions supported by the Collector SIG should make using the Collector easier.- I feel like we often see users post about trouble using a component that doesn't exist in the Core repository. We frequently mention to use Contrib or build their own distribution.
Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap.- Although I am interested if others disagree, but I feel like Core doesn't have a specific purpose. If the original purpose was to meet our commitments, Contrib is already doing that. If the intent was to provide a smaller image, my response is that "to provide a smaller image" isn't a strong enough reason, plus it leads to the issue discussed in the previous bullet.
In contrast, I believe Contrib should continue to be supported. Although we haven't defined its set list of criteria yet, I believe Contrib's meets (or can be made to meet) all the criteria discussed so far. Most importantly it allows us to keep our promises, makes the Collector easy to use, servers a specific purpose*, and meets general user needs.
- We need decide a final purpose statement, but I see Contrib's purpose as being a readily available, easy solution for trying out any component that the Collector SIG supports. Contrib is the solution to "I want to try out the Collector" and I believe it gets to claim this purpose instead of Core since it contains all the components and Core does not.
There are two parts of contrib that I don't like and would make me uncomfortable supporting it more officially:
- it's too big: having everything in it makes it unsuitable for most use-cases. For instance, a sidecar with contrib is not up for consideration. Sure, it's tolerable for most use-cases, but it's not serving a specific purpose.
- it has everything, including things we haven't touched or seen working (for any definition of "working") in a long time. I'm unsure if we have active maintainers and code owners for half of the components there. As such, I would be hesitant to call it "supported".
Sure, it's tolerable for most use-cases, but it's not serving a specific purpose.
I see it's specific purpose to be a dev tool - something that helps users get started playing around with the Collector. Contrib provides an easy, out-of-the-box solution for any user to try out any component they read about. Anything less than Contrib means we'll have to guess which components users want to try out, defeating the purpose of a dev tool. Thinking about it another way, if we only supported Core or only supported the proposed k8s distro how would a user try out the ngixreceiver (chosen randomly as an example)? Would it be up to them to build their own distro? I feel like that requirement goes against the concept that our supported distributions should make using the collector easy.
it has everything, including things we haven't touched or seen working (for any definition of "working") in a long time.
That is a fair concern, but feels like a Contrib repo issue and not a distribution concern. We should write the Contrib criteria to be strict with the Stability requirements and hold both the Contrib repo and distro accountable. If we don't feel confident in a Contrib repo component we should deal with that as part of maintaining Contrib because it has larger implications than just our distros, it affects anyone using that component.
Distributions supported by the Collector must include:
We might consider some distributions that are naturally associated with subset of assets. For example, a windows distribution that contains several windows-only components (eventlog, performancecounters, etc). Should we consider adding a caveat for this? Roughly, "except where specific purpose of the distribution is naturally associated with a subset of these assets. In such cases, it should be clearly stated which assets are skipped and why".
@djaglowski that is a super good point. I like the idea of making that criteria less strict so that we don't limit our options. How about:
- Distributions supported by the Collector must include the following assets except where the specific purpose of the distribution is naturally associated with a subset of these assets. In such cases, it should be clearly stated which assets are skipped and why:
- Binaries for linux_386, linux_amd64, linux_arm64, linux_ppc64le, windows_386, windows_amd64, windows_ppc64le, darwin_amd64, darwin_arm64, and darwin_ppc64le. (Looking at the release assets I don't actually think we provide this today despite claiming we do in our README.
- linux_386, linux_amd64, linux_arm64, and `linux_ppc64le container images
- Packages to be used with Linux distributions (apk, RPM, deb), Mac OS (brew) for each distributed binary.
I feel like we should be more conservative on the assets we require for new distributions: while we ensure we can build binaries for these architectures/OS but we don't run the test suite for most of these 'flavors'. It's clear to me that linux_amd64 and windows_amd64 should be required, and it seems like linux_arm64 should be too even if we don't test it just because of expected usage, but the rest should not be a requirement.
@mx-psi can you describe more about why being less strict with the release assets is a good thing. Is it to make it easier to accept new distributions that might struggle to provide certain assets?
Is it to make it easier to accept new distributions that might struggle to provide certain assets?
That's one part of it, I also just don't think it's a good idea to provide untested assets and I don't want to force new distributions to suffer that maintenance burden. For most of the assets listed we don't (and can't without significant pain) test them on CI and our ability to resolve issues specific to those artifacts is limited (IMO even on Windows we struggle to provide good support).
@mx-psi would you distill the current proposal down to:
- Distributions supported by the Collector must include the following assets except where the specific purpose of the distribution is naturally associated with a subset of these assets. In such cases, it should be clearly stated which assets are skipped and why. Additional asset may be included if the distro desires:
- Binaries for linux_amd64, linux_arm64, windows_amd64 and darwin_arm64
- linux_amd64 and linux_arm64 container images
- Packages to be used with Linux distributions (apk, RPM, deb), Mac OS (brew) for each distributed binary.
@TylerHelmuth That makes sense to me :+1:
I feel like we often see users post about trouble using a component that doesn't exist in the Core repository. We frequently mention to use Contrib or build their own distribution.
I think this is a problem with having more than one distribution, not with core specifically. We also see that with ADOT distros (e.g. see open-telemetry/opentelemetry-collector-contrib/issues/13163 or open-telemetry/opentelemetry-collector-contrib/issues/8109). I don't see this as a reason for removing core, since the problem will still happen.
Core doesn't have a specific purpose. If the original purpose was to meet our commitments, Contrib is already doing that. If the intent was to provide a smaller image, my response is that "to provide a smaller image" isn't a strong enough reason, plus it leads to the issue discussed in the previous bullet.
The purpose of Core is IMO to fulfill use cases that fully rely on non-commercial open source observability solutions. There are some components that we could add (e.g. more processors) to make this more aligned with this purpose, but I think it does it very well.
@TylerHelmuth, during the SIG call today, you mentioned that this issue is reaching a consensus that contrib is going to be supported and core dropped. Based on the rules mentioned here, would you mind listing the two distributions and why/why not they will be supported?
In my view, we should rather slim down contrib to include the new components we think the community is missing the most from core, instead of promoting contrib as is and dropping core.
I think this is a problem with having more than one distribution, not with core specifically. We also see that with ADOT distros (e.g. see https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/13163 or https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/8109). I don't see this as a reason for removing core, since the problem will still happen.
That is fair. Maybe the criteria Distributions supported by the Collector SIG should make using the Collector easier is a little weak? I could see dropping that requirement as Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap implies making the Collector easy to use.
The purpose of Core is IMO to fulfill use cases that fully rely on non-commercial open source observability solutions. There are some components that we could add (e.g. more processors) to make this more aligned with this purpose, but I think it does it very well.
I think that is a reasonable purpose. I'd propose changing the name to better fit that purpose though.
Continuing this Issue's discussion, here are my proposed criteria for what should be included in Contrib:
- All components from opentelemetry-collector and opentelemetry-collector-contrib that have at least 1 signal at Alpha stability or higher.
- Components that are marked Unmaintained will be kept in the distribution for six months. After six months of being unmaintained the component will be removed from the distribution. (This rule directly follows the procedure currently documented in the Unmaintained stability.)
@jpkrohling this comment here is my take on why Contrib meets the criteria we've been discussing and why Core doesn't.
For Core, I think it comes down to the criteria Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap. In my reflection of Core I was not able to come up with a purpose that met this criteria. @mx-psi has suggested a purpose ("to fulfill use cases that fully rely on non-commercial open source observability solutions" that feels like it it could work. If we all agree on that purpose then Core meets the criteria and we'd keep it.
For Contrib, I am very much in favor of the criteria mentioned here in order to fulfill what I see as its purpose ("being a readily available, easy solution for trying out any component that the Collector SIG supports."). We cannot predict which components a user would want to try and to try to predict renders the purpose unachievable.
I guess we need a new dimension then on the support table: are we talking about support for production use-cases, or for dev/PoC? Contrib is great for dev/PoC, but I wouldn't for a moment recommend running it in production.
I feel like @mx-psi's description is good for Core, especially for running it in production.
I guess we need a new dimension then on the support table: are we talking about support for production use-cases, or for dev/PoC? Contrib is great for dev/PoC, but I wouldn't for a moment recommend running it in production.
That is a good point to call out for criteria. Personally, I don't think we need to restrict ourselves to only production-focused distributions.
Continuing to poke at the weak criteria Distributions supported by the Collector SIG should make using the Collector easier., maybe a more focused criteria that comes from the idea of "making the collector easy to use" would be:
Distributions supported by the Collector SIG are not required to be production ready and may be focused on development and proof of concept use cases. The distribution should clearly indicate whether or not the Collector SIG considers it to be production ready.
With a criteria like that the Contrib distribution would definitely be listed as a distribution NOT production ready.
I like that. Are we going to have a list of those distributions somewhere, as a table perhaps? If so, we should explicitly call that out, stating that contrib is useful for X and Y, but not for Z.
We haven't landed on the presentation of the outcome of this discussion yet but a table seems reasonable.
I think one important use case we need to support is gateway mode which needs only the components defined in the core repo, specifically OTLP receiver and OTLP exporter. Jaeger/Zipkin/OpenCensus maybe even unnecessary. I think it would be easier to define if we align on a rule: core components = components from core repository.
But I also agree with @mx-psi about distribution to fulfill use cases that fully rely on non-commercial open source observability solutions. That one can be another one in between core and contrib probably