opentelemetry-collector-releases icon indicating copy to clipboard operation
opentelemetry-collector-releases copied to clipboard

Create rules for supported distributions

Open TylerHelmuth opened this issue 2 years ago • 50 comments
trafficstars

Description

This Issue is a continuation of https://github.com/open-telemetry/oteps/pull/229.

The Collector SIG has had continuous discussion around the distributions we support, if we should support them, if we should support more, and what components should be included in the existing distributions. This is causing the SIG time and effort, but also the lack of clear rules could confuse users who are looking to use or contribute to a distribution. The goal off this issue will be to:

  1. Determine the criteria for when the community will support a distribution
  2. If Contrib fits those criteria, determine the rules for what components are included in the Contrib Distribution
  3. If Core fits those criteria, determine the rules for what components are included in the Core Distribution

We may need to also determine how the Release pipeline will change to be able to scale to more distros. At the moment it already struggles with 2.

Current criteria based on the discussion in this issue:

  • To honor commitments made by OpenTelemetry to other Open Source projects the Collector SIG should support at least 1 distribution that meets those commitments. (IDK what those commitments are, need input from others to more explicitly list out the impact of those commitments.)
  • Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap.
  • Distributions supported by the Collector SIG should meet general needs and not be too niche.
  • Distributions supported by the Collector SIG should only target the needs of the OpenTelemetry project.
  • Distributions supported by the Collector SIG are not required to be production ready and may be focused on development and proof of concept use cases. The distribution should clearly indicate whether or not the Collector SIG considers it to be production ready.
  • Distributions supported by the Collector SIG must only include components from the opentelemetry-collector and opentelemetry-collector-contrib repositories.
  • Distributions supported by the Collector SIG should have a clearly defined list of criteria for which components are included.
  • Distributions supported by the Collector must include the following assets except where the specific purpose of the distribution is naturally associated with a subset of these assets. In such cases, it should be clearly stated which assets are skipped and why. Additional asset may be included if the distro desires:
    • Binaries for linux_amd64, linux_arm64, windows_amd64 and darwin_arm64
    • linux_amd64 and linux_arm64 container images
    • Packages to be used with Linux distributions (apk, RPM, deb), Mac OS (brew) for each distributed binary.

TylerHelmuth avatar Jun 14 '23 17:06 TylerHelmuth

We may need to also determine how the Release pipeline will change to be able to scale to more distros. At the moment it already struggles with 2.

One idea I haven't explored but comes to mind is to split the goreleaser workflow per distribution. I've not tested this, but it should be doable

codeboten avatar Jun 16 '23 19:06 codeboten

Being able to release each distribution independently sounds like a good idea to be able to scale to more distributions.

TylerHelmuth avatar Jun 16 '23 19:06 TylerHelmuth

Can you clarify what's to be understood by "support"? What's expected from the community if we say we support a distribution?

jpkrohling avatar Jun 21 '23 19:06 jpkrohling

I think support implies all the duties covered by the Approver and Maintainer role requirements. In addition (bc I don't see it explicitly listed in the membership responsibilities), support means that the Collector SIG is the owner and maintainer of the binaries/images of the different Collector distributions and is responsible for the pipeline that produces those artifacts.

TylerHelmuth avatar Jun 21 '23 20:06 TylerHelmuth

Determine the criteria for when the community will support a distribution

Here is my initial attempt at an answering:

  • To honor commitments made by OpenTelemetry to other Open Source projects the Collector SIG should support at least 1 distribution that meets those commitments. (IDK what those commitments are, need input from others to more explicitly list out the impact of those commitments.)
  • Distributions supported by the Collector SIG should make using the Collector easier.
  • Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap.
  • Distributions supported by the Collector SIG should meet general needs and not be too niche.
  • Distributions supported by the Collector SIG must only include components from the opentelemetry-collector and opentelemetry-collector-contrib repositories.
  • Distributions supported by the Collector SIG should have a clearly defined list of criteria for which components are included.
  • Distributions supported by the Collector must include:
    • Binaries for linux_386, linux_amd64, linux_arm64, linux_ppc64le, windows_386, windows_amd64, windows_ppc64le, darwin_amd64, darwin_arm64, and darwin_ppc64le. (Looking at the release assets I don't actually think we provide this today despite claiming we do in our README.
    • linux_386, linux_amd64, linux_arm64, and `linux_ppc64le container images
    • Packages to be used with Linux distributions (apk, RPM, deb), Mac OS (brew) for each distributed binary.

TylerHelmuth avatar Jun 21 '23 20:06 TylerHelmuth

IDK what those commitments are

IIRC, those were listed in the original blog post that announced OTel, and included only Prometheus, Jaeger, and Zipkin, apart from OpenTracing and OpenCensus.

Other than that, your proposal looks good to me.

jpkrohling avatar Jun 24 '23 01:06 jpkrohling

One thing I wouldn't mind some more clarification on is:

Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap.

Does that mean the collector SIG sets the direction for another's SIG (say, K8s and Lambda) distribution? Or conversely, what should be the expected level of involvement from those SIGs if we manage the distribution for them?

I know it isn't explicitly said, but is the intent that the supported distributions are intended the fulfil needs of the open telemetry project only? Meaning we are not managing distributions for vendors or other ecosystems?

MovieStoreGuy avatar Jun 25 '23 05:06 MovieStoreGuy

Does that mean the collector SIG sets the direction for another's SIG

I don't think it has to. The Collector SIG should be free to decide which specific purposes it sees as valuable enough to warrant supporting a distro. If that corresponds with the needs of another OpenTelemetry SIG that is a bonus, but it shouldn't be a requirement.

what should be the expected level of involvement from those SIGs if we manage the distribution for them?

I don't think we should view it as managing distributions for another SIG. We should only support a distribution if we see value in it for our users. If another SIG also benefits from it and wants to help support that is a bonus, but we should not rely on others to support these distributions.

I know it isn't explicitly said, but is the intent that the supported distributions are intended the fulfill needs of the open telemetry project only? Meaning we are not managing distributions for vendors or other ecosystems?

Good point, we should explicitly call this out. How about Distributions supported by the Collector SIG should only target the needs of the OpenTelemetry project.

TylerHelmuth avatar Jun 26 '23 14:06 TylerHelmuth

I don't think we should view it as managing distributions for another SIG

They are OpenTelemetry users, no matter which SIG they consume "first". Given that our distributions will have only components from either core or contrib, the support will end up on us anyway, no matter the SIG that proposed the distribution. Perhaps we could provide guidance to other SIGs into which components they can rely on?

jpkrohling avatar Jun 26 '23 17:06 jpkrohling

@jpkrohling I agree. My comment's intent is to say that we should not treat other OTel SIGs differently from other users. I don't think we need to create any SIG-specific guidance either; any guidance we give should apply to SIGs and end-users alike.

TylerHelmuth avatar Jun 26 '23 19:06 TylerHelmuth

To touch on points 2 and 3 in the issue for a bit:

Based on the discussed criteria so far, the Core distribution would no longer be supported. I believe it fails to pass these checks:

  • Distributions supported by the Collector SIG should make using the Collector easier.
    • I feel like we often see users post about trouble using a component that doesn't exist in the Core repository. We frequently mention to use Contrib or build their own distribution.
  • Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap.
    • Although I am interested if others disagree, but I feel like Core doesn't have a specific purpose. If the original purpose was to meet our commitments, Contrib is already doing that. If the intent was to provide a smaller image, my response is that "to provide a smaller image" isn't a strong enough reason, plus it leads to the issue discussed in the previous bullet.

In contrast, I believe Contrib should continue to be supported. Although we haven't defined its set list of criteria yet, I believe Contrib's meets (or can be made to meet) all the criteria discussed so far. Most importantly it allows us to keep our promises, makes the Collector easy to use, servers a specific purpose*, and meets general user needs.

  • We need decide a final purpose statement, but I see Contrib's purpose as being a readily available, easy solution for trying out any component that the Collector SIG supports. Contrib is the solution to "I want to try out the Collector" and I believe it gets to claim this purpose instead of Core since it contains all the components and Core does not.

TylerHelmuth avatar Jun 26 '23 21:06 TylerHelmuth

There are two parts of contrib that I don't like and would make me uncomfortable supporting it more officially:

  1. it's too big: having everything in it makes it unsuitable for most use-cases. For instance, a sidecar with contrib is not up for consideration. Sure, it's tolerable for most use-cases, but it's not serving a specific purpose.
  2. it has everything, including things we haven't touched or seen working (for any definition of "working") in a long time. I'm unsure if we have active maintainers and code owners for half of the components there. As such, I would be hesitant to call it "supported".

jpkrohling avatar Jun 27 '23 00:06 jpkrohling

Sure, it's tolerable for most use-cases, but it's not serving a specific purpose.

I see it's specific purpose to be a dev tool - something that helps users get started playing around with the Collector. Contrib provides an easy, out-of-the-box solution for any user to try out any component they read about. Anything less than Contrib means we'll have to guess which components users want to try out, defeating the purpose of a dev tool. Thinking about it another way, if we only supported Core or only supported the proposed k8s distro how would a user try out the ngixreceiver (chosen randomly as an example)? Would it be up to them to build their own distro? I feel like that requirement goes against the concept that our supported distributions should make using the collector easy.

it has everything, including things we haven't touched or seen working (for any definition of "working") in a long time.

That is a fair concern, but feels like a Contrib repo issue and not a distribution concern. We should write the Contrib criteria to be strict with the Stability requirements and hold both the Contrib repo and distro accountable. If we don't feel confident in a Contrib repo component we should deal with that as part of maintaining Contrib because it has larger implications than just our distros, it affects anyone using that component.

TylerHelmuth avatar Jun 27 '23 01:06 TylerHelmuth

Distributions supported by the Collector must include:

We might consider some distributions that are naturally associated with subset of assets. For example, a windows distribution that contains several windows-only components (eventlog, performancecounters, etc). Should we consider adding a caveat for this? Roughly, "except where specific purpose of the distribution is naturally associated with a subset of these assets. In such cases, it should be clearly stated which assets are skipped and why".

djaglowski avatar Jun 28 '23 13:06 djaglowski

@djaglowski that is a super good point. I like the idea of making that criteria less strict so that we don't limit our options. How about:

  • Distributions supported by the Collector must include the following assets except where the specific purpose of the distribution is naturally associated with a subset of these assets. In such cases, it should be clearly stated which assets are skipped and why:
    • Binaries for linux_386, linux_amd64, linux_arm64, linux_ppc64le, windows_386, windows_amd64, windows_ppc64le, darwin_amd64, darwin_arm64, and darwin_ppc64le. (Looking at the release assets I don't actually think we provide this today despite claiming we do in our README.
    • linux_386, linux_amd64, linux_arm64, and `linux_ppc64le container images
    • Packages to be used with Linux distributions (apk, RPM, deb), Mac OS (brew) for each distributed binary.

TylerHelmuth avatar Jun 28 '23 14:06 TylerHelmuth

I feel like we should be more conservative on the assets we require for new distributions: while we ensure we can build binaries for these architectures/OS but we don't run the test suite for most of these 'flavors'. It's clear to me that linux_amd64 and windows_amd64 should be required, and it seems like linux_arm64 should be too even if we don't test it just because of expected usage, but the rest should not be a requirement.

mx-psi avatar Jun 28 '23 16:06 mx-psi

@mx-psi can you describe more about why being less strict with the release assets is a good thing. Is it to make it easier to accept new distributions that might struggle to provide certain assets?

TylerHelmuth avatar Jun 28 '23 16:06 TylerHelmuth

Is it to make it easier to accept new distributions that might struggle to provide certain assets?

That's one part of it, I also just don't think it's a good idea to provide untested assets and I don't want to force new distributions to suffer that maintenance burden. For most of the assets listed we don't (and can't without significant pain) test them on CI and our ability to resolve issues specific to those artifacts is limited (IMO even on Windows we struggle to provide good support).

mx-psi avatar Jun 28 '23 17:06 mx-psi

@mx-psi would you distill the current proposal down to:

  • Distributions supported by the Collector must include the following assets except where the specific purpose of the distribution is naturally associated with a subset of these assets. In such cases, it should be clearly stated which assets are skipped and why. Additional asset may be included if the distro desires:
    • Binaries for linux_amd64, linux_arm64, windows_amd64 and darwin_arm64
    • linux_amd64 and linux_arm64 container images
    • Packages to be used with Linux distributions (apk, RPM, deb), Mac OS (brew) for each distributed binary.

TylerHelmuth avatar Jun 28 '23 17:06 TylerHelmuth

@TylerHelmuth That makes sense to me :+1:

mx-psi avatar Jun 28 '23 19:06 mx-psi

I feel like we often see users post about trouble using a component that doesn't exist in the Core repository. We frequently mention to use Contrib or build their own distribution.

I think this is a problem with having more than one distribution, not with core specifically. We also see that with ADOT distros (e.g. see open-telemetry/opentelemetry-collector-contrib/issues/13163 or open-telemetry/opentelemetry-collector-contrib/issues/8109). I don't see this as a reason for removing core, since the problem will still happen.

Core doesn't have a specific purpose. If the original purpose was to meet our commitments, Contrib is already doing that. If the intent was to provide a smaller image, my response is that "to provide a smaller image" isn't a strong enough reason, plus it leads to the issue discussed in the previous bullet.

The purpose of Core is IMO to fulfill use cases that fully rely on non-commercial open source observability solutions. There are some components that we could add (e.g. more processors) to make this more aligned with this purpose, but I think it does it very well.

mx-psi avatar Jul 12 '23 16:07 mx-psi

@TylerHelmuth, during the SIG call today, you mentioned that this issue is reaching a consensus that contrib is going to be supported and core dropped. Based on the rules mentioned here, would you mind listing the two distributions and why/why not they will be supported?

In my view, we should rather slim down contrib to include the new components we think the community is missing the most from core, instead of promoting contrib as is and dropping core.

jpkrohling avatar Jul 12 '23 17:07 jpkrohling

I think this is a problem with having more than one distribution, not with core specifically. We also see that with ADOT distros (e.g. see https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/13163 or https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/8109). I don't see this as a reason for removing core, since the problem will still happen.

That is fair. Maybe the criteria Distributions supported by the Collector SIG should make using the Collector easier is a little weak? I could see dropping that requirement as Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap implies making the Collector easy to use.

The purpose of Core is IMO to fulfill use cases that fully rely on non-commercial open source observability solutions. There are some components that we could add (e.g. more processors) to make this more aligned with this purpose, but I think it does it very well.

I think that is a reasonable purpose. I'd propose changing the name to better fit that purpose though.

TylerHelmuth avatar Jul 12 '23 17:07 TylerHelmuth

Continuing this Issue's discussion, here are my proposed criteria for what should be included in Contrib:

  1. All components from opentelemetry-collector and opentelemetry-collector-contrib that have at least 1 signal at Alpha stability or higher.
  2. Components that are marked Unmaintained will be kept in the distribution for six months. After six months of being unmaintained the component will be removed from the distribution. (This rule directly follows the procedure currently documented in the Unmaintained stability.)

TylerHelmuth avatar Jul 12 '23 17:07 TylerHelmuth

@jpkrohling this comment here is my take on why Contrib meets the criteria we've been discussing and why Core doesn't.

For Core, I think it comes down to the criteria Distributions supported by the Collector SIG should serve a specific purpose and those purposes should have minimal overlap. In my reflection of Core I was not able to come up with a purpose that met this criteria. @mx-psi has suggested a purpose ("to fulfill use cases that fully rely on non-commercial open source observability solutions" that feels like it it could work. If we all agree on that purpose then Core meets the criteria and we'd keep it.

For Contrib, I am very much in favor of the criteria mentioned here in order to fulfill what I see as its purpose ("being a readily available, easy solution for trying out any component that the Collector SIG supports."). We cannot predict which components a user would want to try and to try to predict renders the purpose unachievable.

TylerHelmuth avatar Jul 12 '23 18:07 TylerHelmuth

I guess we need a new dimension then on the support table: are we talking about support for production use-cases, or for dev/PoC? Contrib is great for dev/PoC, but I wouldn't for a moment recommend running it in production.

I feel like @mx-psi's description is good for Core, especially for running it in production.

jpkrohling avatar Jul 12 '23 19:07 jpkrohling

I guess we need a new dimension then on the support table: are we talking about support for production use-cases, or for dev/PoC? Contrib is great for dev/PoC, but I wouldn't for a moment recommend running it in production.

That is a good point to call out for criteria. Personally, I don't think we need to restrict ourselves to only production-focused distributions.

Continuing to poke at the weak criteria Distributions supported by the Collector SIG should make using the Collector easier., maybe a more focused criteria that comes from the idea of "making the collector easy to use" would be:

  • Distributions supported by the Collector SIG are not required to be production ready and may be focused on development and proof of concept use cases. The distribution should clearly indicate whether or not the Collector SIG considers it to be production ready.

With a criteria like that the Contrib distribution would definitely be listed as a distribution NOT production ready.

TylerHelmuth avatar Jul 12 '23 19:07 TylerHelmuth

I like that. Are we going to have a list of those distributions somewhere, as a table perhaps? If so, we should explicitly call that out, stating that contrib is useful for X and Y, but not for Z.

jpkrohling avatar Jul 12 '23 20:07 jpkrohling

We haven't landed on the presentation of the outcome of this discussion yet but a table seems reasonable.

TylerHelmuth avatar Jul 12 '23 20:07 TylerHelmuth

I think one important use case we need to support is gateway mode which needs only the components defined in the core repo, specifically OTLP receiver and OTLP exporter. Jaeger/Zipkin/OpenCensus maybe even unnecessary. I think it would be easier to define if we align on a rule: core components = components from core repository.

But I also agree with @mx-psi about distribution to fulfill use cases that fully rely on non-commercial open source observability solutions. That one can be another one in between core and contrib probably

dmitryax avatar Jul 26 '23 07:07 dmitryax