operator-lifecycle-manager
operator-lifecycle-manager copied to clipboard
lower/remove "object has been modified" type log messages from the OLM operators
Bug Report
What did you do? A clear and concise description of the steps you took (or insert a code snippet).
Currently OLM emits a lot of unnecessary cache miss logs at the error level in the style of
error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"operatorhubio-catalog\": the object has been modified; please apply your changes to the latest version and try again" id=OTXzH source=operatorhubio-catalog
which confuses admins and others looking at the OLM logs. It also greatly increases the volume of OLM logs which then affects clients of OLM logs such as log aggregation/exporter tools. It's worth investigating whether there is a way to remove or tone down these logs in a way that does not change the sync behavior of OLM.
Environment
- operator-lifecycle-manager version:
- Kubernetes version information:
- Kubernetes cluster kind:
Possible Solution
Additional context Add any other context about the problem here.
One possible solution lowering logs from the informer could be passing a different instance of the logger than the one that OLM generates on startup and fine-tuning it
Initial solution is to lower the log-level of this particular error in the queueinformer underlying this particular operator's queue.
May be a symptom of underlying controller problems.
Possible solution is generate a metric based on when this modified error occurs and then diagnose based on the number of occurrences.
@exdx Should this be closed now that https://github.com/operator-framework/operator-lifecycle-manager/pull/2631 has been merged for a while now?
So that PR doesn't address the specific logging issue that's described here as this issue is more complex. That PR moved logs that should be at the debug level to debug from info.
This issue needs to be triaged further because it's not entirely evident how to avoid logging these messages, and they potentially do indicate a problem with the OLM controllers (which we may simply accept for the time being).
This code may allow us to identify whenever an update fails due to a conflict.