multicluster-observability-operator icon indicating copy to clipboard operation
multicluster-observability-operator copied to clipboard

[WIP][ACM-17966] Namespace right sizing recommendation changes

Open rajusem opened this issue 7 months ago • 8 comments

This PR helps to enable namespaceRightSizingRecommendation feature with MCO. User can enable it by updating CRD as below.

kind: MultiClusterObservability
metadata:
  name: observability
spec:
  capabilities:
    platform:
      analytics:
        namespaceRightSizingRecommendation:
          enabled: true

Epic - https://issues.redhat.com/browse/ACM-17966

Signed-off-by: Raj Zalavadia [email protected]

rajusem avatar Apr 10 '25 05:04 rajusem

/test test-e2e

rajusem avatar May 05 '25 05:05 rajusem

/test test-e2e

rajusem avatar May 05 '25 08:05 rajusem

/test test-e2e

rajusem avatar May 05 '25 14:05 rajusem

/test test-e2e

rajusem avatar May 06 '25 01:05 rajusem

/retest

rajusem avatar May 06 '25 03:05 rajusem

/test e2e-kind

rajusem avatar May 13 '25 08:05 rajusem

/test test-e2e

rajusem avatar May 16 '25 11:05 rajusem

/test test-e2e

rajusem avatar May 18 '25 16:05 rajusem

/test test-e2e

rajusem avatar May 19 '25 04:05 rajusem

@rajusem: The following commands are available to trigger required jobs:

/test e2e-kind
/test images
/test ocm-ci-rbac
/test pr-image-mirror-endpoint-monitoring-operator
/test pr-image-mirror-grafana-dashboard-loader
/test pr-image-mirror-metrics-collector
/test pr-image-mirror-multicluster-observability-operator
/test pr-image-mirror-rbac-query-proxy
/test test-e2e
/test test-unit

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar May 21 '25 11:05 openshift-ci[bot]

/test all

rajusem avatar May 21 '25 11:05 rajusem

/ok-to-test

rajusem avatar May 21 '25 12:05 rajusem

I haven't tested it, but I have already reviewed it couple of times. /lgtm

tremes avatar May 22 '25 10:05 tremes

Can you please add a scrapeConfig in the grafana analytics directory containing the new dashboard so that the needed metrics are also federated when MCOA for metrics is enabled? I'll add CI checks for this new analytics directory in another PR (won't be blocking this one)

thibaultmg avatar May 22 '25 14:05 thibaultmg

Can you please add a scrapeConfig in the grafana analytics directory containing the new dashboard so that the needed metrics are also federated when MCOA for metrics is enabled? I'll add CI checks for this new analytics directory in another PR (won't be blocking this one)

@thibaultmg Ack. Added scrapeConfig in analytics directory.

rajusem avatar May 25 '25 13:05 rajusem

Looks pretty sensible to me on a first pass. Since this is a big PR I also asked another engineer to take a look.

As a general comment, I think the logging here is very verbose, and I also see some logs which are very generic like the below, whereas other has RS prefixed - would be good to have a common pattern.

log.Info("ConfigMap data has changed, processing update")

Could we consider doing a pass over the logs and see if there are some which could be removed?

@jacobbaungard Revisited and updated logs with the "RS -" prefix, and also removed a few unnecessary logs.

rajusem avatar May 25 '25 13:05 rajusem

/retest

rajusem avatar May 25 '25 14:05 rajusem

/test test-e2e

rajusem avatar May 26 '25 02:05 rajusem

/test all

rajusem avatar May 26 '25 03:05 rajusem

/test test-e2e

rajusem avatar May 26 '25 04:05 rajusem

/test test-e2e

rajusem avatar May 26 '25 04:05 rajusem

/test test-e2e

rajusem avatar May 26 '25 08:05 rajusem

/test test-e2e

rajusem avatar May 26 '25 10:05 rajusem

/test test-e2e

rajusem avatar May 26 '25 10:05 rajusem

Looks good overall. But I agree with @jacobbaungard , this PR introduces very noisy logs. As MCO is doing a lot, this quickly become very hard to read. We've already done some work to reduce them. This idea is to only log at info level write operations (like resources create/update/delete). Errors that are returned by the reconcile loop don't need to be logged as they are already logged by the runtime. And the rest can go in debug. This is a rule of thumb of course, there can be exceptions. We haven't refactored the whole codebase to follow these rules, but let's avoid introducing too much noise with new features. Thanks.

@thibaultmg Thanks for the feedback. Updated the PR based on your comment to make the logging more consistent across MCO. Removed duplicate error logs, adjusted some logs from info to debug level, and retained info level for write operations Let me know if you have any further suggestions.

rajusem avatar May 26 '25 14:05 rajusem

/retest

rajusem avatar May 27 '25 04:05 rajusem

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rajusem, thibaultmg, tremes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar May 27 '25 11:05 openshift-ci[bot]