opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Documentation: Outdated comment about per-node allocation strategy and multiple replicas support

Open Ca-moes opened this issue 3 months ago • 1 comments

Component(s)

collector

Describe the issue you're reporting

The documentation comment in apis/v1alpha1/opentelemetrycollector_types.go is outdated and contradicts the actual capabilities of the Target Allocator, particularly regarding the per-node allocation strategy's support for multiple replicas.

Current Misleading Documentation

In apis/v1alpha1/opentelemetrycollector_types.go, the Replicas field comment states:

// Replicas is the number of pod instances for the underlying TargetAllocator.
// This should only be set to a value other than 1 if a strategy that allows
// for high availability is chosen. Currently, the only allocation strategy
// that can be run in a high availability mode is consistent-hashing.

This comment directly contradicts:

  1. Maintainer confirmation in Issue #2900 where @swiatekm stated: "It should work correctly... we don't check Replicas at admission, so you can really set it to whatever you want"
  2. PR #2932 merged June 13, 2024 which specifically added PodDisruptionBudget support for per-node strategy, explicitly enabling HA configurations
  3. The technical reality that per-node allocation is stateless and deterministic, making it inherently safe for multiple replicas

Impact

This outdated comment may cause users to:

  • Avoid implementing HA configurations for per-node Target Allocators
  • Miss opportunities to eliminate single points of failure in critical observability infrastructure
  • Question whether officially supported configurations are actually valid
  • Potentially experience incidents due to lack of HA (we experienced this in June 2024)

Expected Behavior

The comment should be updated to reflect that:

  1. Per-node allocation strategy does support multiple replicas
  2. Multiple replicas require operator v0.103.0+ (which includes PR #2932 for PDB support)
  3. Both consistent-hashing and per-node strategies can run in HA mode

Proposed Documentation Fix

// Replicas is the number of pod instances for the underlying TargetAllocator.
// This should only be set to a value other than 1 if a strategy that allows
// for high availability is chosen.
//
// Allocation strategies supporting high availability:
// - consistent-hashing: Supported (distributes targets via hashing)
// - per-node: Supported (requires operator v0.103.0+, deterministic allocation)
// - least-weighted: Not recommended for HA (may cause allocation conflicts)
//
// When using multiple replicas, configure a PodDisruptionBudget to ensure
// availability during cluster maintenance operations.

Additional Context

  • This comment appears to be 3+ years old and predates the PDB support PR
  • Issue #2900 has extensive maintainer discussion confirming per-node works with multiple replicas
  • Users relying on this documentation are being misled about the operator's actual capabilities
  • The Target Allocator's stateless, deterministic design makes it inherently safe for HA with per-node strategy

References

  • Issue #2900: https://github.com/open-telemetry/opentelemetry-operator/issues/2900
  • PR #2932: https://github.com/open-telemetry/opentelemetry-operator/pull/2932
  • Source file: https://github.com/open-telemetry/opentelemetry-operator/blob/main/apis/v1alpha1/opentelemetrycollector_types.go

Would appreciate if this documentation could be updated to reflect the current reality and prevent others from experiencing the same confusion and potential production incidents.

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Ca-moes avatar Oct 06 '25 08:10 Ca-moes

Sounds like it should be updated, indeed. Your proposed fix is misleading, though:

// Allocation strategies supporting high availability:
// - consistent-hashing: Supported (distributes targets via hashing)
// - per-node: Supported (requires operator v0.103.0+, deterministic allocation)
// - least-weighted: Not recommended for HA (may cause allocation conflicts)

least-weighted shouldn't be on this list.

Would you like to contribute a fix, @Ca-moes ?

swiatekm avatar Oct 07 '25 10:10 swiatekm