operator-lifecycle-manager icon indicating copy to clipboard operation
operator-lifecycle-manager copied to clipboard

lower/remove "object has been modified" type log messages from the OLM operators

Open exdx opened this issue 3 years ago • 5 comments

Bug Report

What did you do? A clear and concise description of the steps you took (or insert a code snippet).

Currently OLM emits a lot of unnecessary cache miss logs at the error level in the style of

error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"operatorhubio-catalog\": the object has been modified; please apply your changes to the latest version and try again" id=OTXzH source=operatorhubio-catalog

which confuses admins and others looking at the OLM logs. It also greatly increases the volume of OLM logs which then affects clients of OLM logs such as log aggregation/exporter tools. It's worth investigating whether there is a way to remove or tone down these logs in a way that does not change the sync behavior of OLM.

Environment

  • operator-lifecycle-manager version:
  • Kubernetes version information:
  • Kubernetes cluster kind:

Possible Solution

Additional context Add any other context about the problem here.

exdx avatar Feb 22 '22 21:02 exdx

One possible solution lowering logs from the informer could be passing a different instance of the logger than the one that OLM generates on startup and fine-tuning it

exdx avatar Feb 22 '22 21:02 exdx

Initial solution is to lower the log-level of this particular error in the queueinformer underlying this particular operator's queue.

May be a symptom of underlying controller problems.

Possible solution is generate a metric based on when this modified error occurs and then diagnose based on the number of occurrences.

exdx avatar Mar 03 '22 20:03 exdx

@exdx Should this be closed now that https://github.com/operator-framework/operator-lifecycle-manager/pull/2631 has been merged for a while now?

timflannagan avatar Apr 27 '22 02:04 timflannagan

So that PR doesn't address the specific logging issue that's described here as this issue is more complex. That PR moved logs that should be at the debug level to debug from info.

This issue needs to be triaged further because it's not entirely evident how to avoid logging these messages, and they potentially do indicate a problem with the OLM controllers (which we may simply accept for the time being).

exdx avatar Apr 27 '22 15:04 exdx

This code may allow us to identify whenever an update fails due to a conflict.

awgreene avatar Apr 28 '22 19:04 awgreene