opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Write failed manifests to log at debug level

Open ringerc opened this issue 3 years ago • 3 comments
trafficstars

When the reconciler fails to apply a manifest, there's very little the user can do to figure out exactly what the failed manifest payload was or where it came from.

It would be very helpful to have a debug level log record to capture the manifest payload when apply fails.

For example, this recent error I hit:

{"level":"error","ts":1652677368.8442643,"logger":"controllers.OpenTelemetryCollector","msg":"failed to reconcile config maps","error":"failed to reconcile the expected configmaps: failed to apply changes: ConfigMap \"otel-collector\" is invalid: metadata.labels: Invalid value: \"8f65b4d94bb5290c8fc1540703c06f7a7a12cfd917d2f141bdc8a18803828615\": must be no more than 63 characters","stacktrace":"github.com/open-telemetry/opentelemetry-operator/controllers.(*OpenTelemetryCollectorReconciler).Reconcile\n\t/workspace/controllers/opentelemetrycollector_controller.go:153\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

is inscrutable at best without being able to access the generated ConfigMap. The value shown is not present in the input CRD, and when decoded as hex does not appear to have any textual content.

When --zap-log-level=debug is passed, it'd be helpful to have the generated configmap dumped.

ringerc avatar May 16 '22 05:05 ringerc

The specific error above seems to arise from https://github.com/open-telemetry/opentelemetry-operator/blob/0dce2dfbdaefa86b0052775641a38263a7599266/pkg/collector/reconcile/configmap.go#L188

with a configmap created by https://github.com/open-telemetry/opentelemetry-operator/blob/0dce2dfbdaefa86b0052775641a38263a7599266/pkg/collector/reconcile/configmap.go#L41

ringerc avatar May 16 '22 05:05 ringerc

See also https://github.com/open-telemetry/opentelemetry-operator/issues/873

ringerc avatar May 16 '22 06:05 ringerc

When the reconciler fails to apply a manifest, there's very little the user can do to figure out exactly what the failed manifest payload was or where it came from.

What do you mean by manifest? The CR created by user or the OTEL configmap created by the operator. Note that the OTEL configmap should match the collector config from the CR so user has access to it.

pavolloffay avatar May 16 '22 12:05 pavolloffay

closed by #2193 and superseded by #2399

jaronoff97 avatar Nov 28 '23 21:11 jaronoff97