opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Configuration resolver merge bug upon null component map

Open SeanPMiller opened this issue 10 months ago • 6 comments

Describe the bug Merge of two or more configurations fails when any component map is null. Based on my testing, I think there is a bug in the merge logic behind confmap.Resolver.Resolve(). The null seems to be "infectious" and wipes the map for that component type.

Steps to reproduce

  1. Install OCB to some working directory.
  2. Add the configuration file builder-config.yaml from Additional context to your working directory.
  3. Build an OpenTelemetry Collector binary using OCB with the following command:
    ./ocb --config builder-config.yaml 
    
  4. Add the two configuration files alpha.yaml and beta.yaml from Additional context to your working directory.
  5. Validate with the following command:
    ./otelcol/otelcol validate --config=file:$(pwd)/alpha.yaml --config=file:$(pwd)/beta.yaml
    

What did you expect to see? Success.

What did you see instead? Failure. Specifically,

Error: service::pipelines::metrics/beta: references processor "batch/alpha" which is not configured

What version did you use? See builder-config.yaml.

What config did you use? See alpha.yaml and beta.yaml.

Environment

$ go version
go version go1.22.0 linux/amd64
$

Additional context builder-config.yaml:

---
dist:
    name: otelcol
    description: OTel Collector to demonstrate Resolver bug
    output_path: ./otelcol
    otelcol_version: 0.97.0
receivers:
  - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.97.0
processors:
  - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.97.0
exporters:
  - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.97.0
...

alpha.yaml:

---
receivers:
    otlp/alpha:
        protocols:
            http:
                endpoint: localhost:4318

processors:
    batch/alpha:

exporters:
    otlphttp/alpha:
        endpoint: foo.bar.qux.com:4321

service:
    pipelines:
        metrics/alpha:
            receivers:
              - otlp/alpha
            exporters:
              - otlphttp/alpha
...

beta.yaml:

---
receivers:
    otlp/beta:
        protocols:
            grpc:
                endpoint: localhost:4317

processors:

service:
    pipelines:
        metrics/beta:
            receivers:
              - otlp/beta
            processors:
              - batch/alpha
            exporters:
              - otlphttp/alpha
...

SeanPMiller avatar Apr 09 '24 19:04 SeanPMiller

Although the above instructions demonstrate incorrect behavior, if you explicitly provide empty map value {} to key processors in beta.yaml, then validation will pass as expected.

SeanPMiller avatar Apr 09 '24 19:04 SeanPMiller

To be honest I'm usually confused by this behavior. I don't think the current YAML spec 1.2.2 covers merging of two YAML documents.

I believe the current behavior is a result of the internal implementation, which (if I'm not mistaken) uses koanf to merge parsed YAML files:

  • https://github.com/open-telemetry/opentelemetry-collector/blob/v0.97.0/confmap/resolver.go#L127
  • https://github.com/knadh/koanf/blob/v2.1.0/maps/maps.go#L108

We could either accept it (and possibly document if it is not currently documented), or decide on another behavior and change the implementation.

I wonder if I'm overthinking it or missing something. :thinking: Will take it to today's SIG.

andrzej-stencel avatar Apr 10 '24 12:04 andrzej-stencel

AFAICT, merging is unspecified anywhere; however, OTel documents three things on the main config-doc page that contributed to my expectations for this behavior.

  1. Documented configuration examples showing null component maps.
  2. Option --config documented as repeatable.
  3. Multiple configuration sources documented as merged into a single representation.

In no scenario would I expect a null map, when merged with a non-null map, to yield a null map. Rather, I would expect null to be treated like {}.

SeanPMiller avatar Apr 10 '24 15:04 SeanPMiller

That said, thanks for weighing in. Let me know how what the SIG thinks.

SeanPMiller avatar Apr 10 '24 15:04 SeanPMiller

~I think this issue is related: https://github.com/open-telemetry/opentelemetry-collector/issues/8754. By default, koanf will overwrite lists instead of merging them.~ Edit: I misread the config, list merging shouldn't impact this. I do think this stems from similar behavior where null is considered a value instead of a map and therefore overwrites prior values. I would agree that merging a map and null shouldn't erase the map.

evan-bradley avatar Apr 10 '24 16:04 evan-bradley

The outcome of the discussion in the SIG meeting is that this is expected behavior. We might need to document this.

andrzej-stencel avatar Apr 16 '24 12:04 andrzej-stencel