operator-sdk icon indicating copy to clipboard operation
operator-sdk copied to clipboard

Dynamic namespace handling for ServiceMonitor serverName in operator metrics scraping

Open iblancasa opened this issue 1 year ago • 3 comments

Feature Request

Describe the problem you need a feature to resolve.

We have the /metrics endpoint for the operator, and we create a ServiceMonitor to scrape those metrics. However, the ServiceMonitor requires a CA and a serverName. The serverName field depends on the namespace where the operator is installed.

If the user installs the operator in a namespace different from the default one, the serverName field is incorrect. This leads to the certificate being invalid, and as a result, the ServiceMonitor cannot scrape the metrics.

For example:

 tlsConfig:
   ca: {}
   caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
   cert: {}
   serverName: my-operator-controller-manager-metrics-service.my-namespace.svc

The current solution is to create the ServiceMonitor from the operator at runtime, but this solution is not ideal because it introduces OpenShift-specific logic in upstream operators.

Describe the solution you'd like.

We want the operator-sdk to dynamically handle the serverName configuration for the ServiceMonitor based on the namespace where the operator is installed. This would ensure that the correct serverName is used, regardless of the installation namespace, making the certificate valid and allowing the ServiceMonitor to scrape the metrics properly.

Or any alternative solution that can help with this use case.

iblancasa avatar Oct 18 '24 10:10 iblancasa

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Jan 17 '25 01:01 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Feb 16 '25 08:02 openshift-bot

/lifecycle frozen

iblancasa avatar Feb 17 '25 10:02 iblancasa