openshift-docs icon indicating copy to clipboard operation
openshift-docs copied to clipboard

OSDOCS-9480: NetObserv Custom Metrics

Open skrthomas opened this issue 10 months ago • 8 comments

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9480 Link to docs preview:

Custom Metrics conceptual info: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-custom-metrics_metrics-dashboards-alerts

QE review:

  • [ ] QE has approved this change.

Additional information:

skrthomas avatar Apr 09 '24 14:04 skrthomas

@skrthomas: This pull request references OSDOCS-9480 which is a valid jira issue.

In response to this:

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9480 Link to docs preview:

QE review:

  • [ ] QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Apr 09 '24 14:04 openshift-ci-robot

🤖 Wed May 29 02:11:31 - Prow CI generated the docs preview:

https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html

ocpdocs-previewbot avatar Apr 09 '24 22:04 ocpdocs-previewbot

@skrthomas: This pull request references OSDOCS-9480 which is a valid jira issue.

In response to this:

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9480 Link to docs preview:

Custom Metrics conceptual info: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-custom-metrics_metrics-dashboards-alerts

Configuring custom metrics: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/configuring-operator#network-observability-custom-metrics-id-unknown-ingress_network_observability

QE review:

  • [ ] QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Apr 12 '24 18:04 openshift-ci-robot

@jotak can you PTAL at this first draft?

skrthomas avatar Apr 12 '24 18:04 skrthomas

@skrthomas FYI, I'm adding doc in our upstream Metrics.md here: https://github.com/netobserv/network-observability-operator/pull/609/files#diff-66471adf627be69953aa4a35b4c845a3ff885b746f0bd9ee9c3179fe38a15886

Might be useful for this PR too

jotak avatar Apr 17 '24 13:04 jotak

@jotak Thanks for linking me to that upstream documentation. It helped me understand more how the predefined and custom metrics fit together, so I can better organize the documentation. Also thanks for those great examples in there. I added them in this documentation as well.

skrthomas avatar Apr 18 '24 00:04 skrthomas

@skrthomas: This pull request references OSDOCS-9480 which is a valid jira issue.

In response to this:

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9480 Link to docs preview:

Custom Metrics conceptual info: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-custom-metrics_metrics-dashboards-alerts

QE review:

  • [ ] QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Apr 18 '24 00:04 openshift-ci-robot

Just a few comments, other than that LGTM!

jotak avatar May 14 '24 07:05 jotak

There's some duplicated text between sections "Predefined Metrics" and "Network Observability metrics":

Capture d’écran du 2024-05-21 10-05-26

The first sentence is repeated (I saw this using the preview link, not sure if it's up to date)

jotak avatar May 21 '24 08:05 jotak

Also, I know adding links to github is generally avoided, but shouldn't we link to the samples directory that we have for custom metrics? ( https://github.com/netobserv/network-observability-operator/tree/main/config/samples/flowmetrics ) I think it will be useful for customers

jotak avatar May 21 '24 08:05 jotak

@skrthomas I found something else to update: the Alert example in this section still uses the PrometheusRule instead of AlertingRule. When we made the change there I guess we forgot to do in this section as well.

The new YAML should be:

apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
  name: netobserv-alerts
  namespace: openshift-monitoring
spec:
  groups:
  - name: NetObservAlerts
    rules:
    - alert: NetObservIncomingBandwidth
      annotations:
        message: |-
          {{ $labels.job }}: incoming traffic exceeding 10 MBps for 30s on {{ $labels.DstK8S_OwnerType }} {{ $labels.DstK8S_OwnerName }} ({{ $labels.DstK8S_Namespace }}).
        summary: "High incoming traffic."
      expr: sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[1m])) by (job, DstK8S_Namespace, DstK8S_OwnerName, DstK8S_OwnerType) > 10000000      
      for: 30s
      labels:
        severity: warning

(just changing apiVersion, kind and namespace)

jotak avatar May 21 '24 08:05 jotak

@skrthomas I found something else to update: the Alert example in this section still uses the PrometheusRule instead of AlertingRule. When we made the change there I guess we forgot to do in this section as well.

The new YAML should be:

apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
  name: netobserv-alerts
  namespace: openshift-monitoring
spec:
  groups:
  - name: NetObservAlerts
    rules:
    - alert: NetObservIncomingBandwidth
      annotations:
        message: |-
          {{ $labels.job }}: incoming traffic exceeding 10 MBps for 30s on {{ $labels.DstK8S_OwnerType }} {{ $labels.DstK8S_OwnerName }} ({{ $labels.DstK8S_Namespace }}).
        summary: "High incoming traffic."
      expr: sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[1m])) by (job, DstK8S_Namespace, DstK8S_OwnerName, DstK8S_OwnerType) > 10000000      
      for: 30s
      labels:
        severity: warning

(just changing apiVersion, kind and namespace)

Hmmm this is suspicious because I definitely made the change to the Creating alerts topic, which you can see here: https://github.com/openshift/openshift-docs/pull/74099/files#diff-8d0022591f4f4567d0cc2ff90110d33839696453336a9fd67a8d766ff79f3f48R22. And this change was merged to 4.14+ and the entire example was removed from 4.12, 4.13 cf https://github.com/openshift/openshift-docs/pull/74100.

When I look at the live doc it says AlertingRule. i think maybe what happened is I created the no-1.6 branch on April 9 I think because this Custom Metrics PR was the first PR I opened in that branch, and it says I opened it April 9. Then I see I merged those PRs to change AlertingRule on April 9. So when I created my no-1.6 branch, it basically copies openshift-docs main branch at the time, and then it doesn't sync back up until I rebase everything in the no-1.6 branch with the main branch, which I was doing at the end of the cycle. So let me try to balance my branches, and maybe that will fix this weirdness.

skrthomas avatar May 21 '24 20:05 skrthomas

/retest

skrthomas avatar May 21 '24 20:05 skrthomas

Also, I know adding links to github is generally avoided, but shouldn't we link to the samples directory that we have for custom metrics? ( https://github.com/netobserv/network-observability-operator/tree/main/config/samples/flowmetrics ) I think it will be useful for customers

I agree that all these samples are really helpful for customers, which is why in my initial draft I was wanting to include them all (although I can appreciate the maintenance you're concerned about). When I've asked about things like this in the past, the preference is to include them in the documentation itself rather than linking to GitHub. Our contrib guide has pretty clear directive not to include links to GitHub: https://github.com/openshift/openshift-docs/blob/main/contributing_to_docs/doc_guidelines.adoc#links-to-external-websites , and the exception is one I got approval for specifically for our API documentation.

skrthomas avatar May 21 '24 20:05 skrthomas

@skrthomas ok, my bad for the AlertingRule I should have checked the actual doc which is fine, I guess yes it's just a matter of outdated branch so it shouldn't be an issue

For the github link I was expecting / fearing this answer :-) ok then let's just forget about it.

jotak avatar May 22 '24 10:05 jotak

For the github link I was expecting / fearing this answer :-) ok then let's just forget about it.

@jotak I'm sorry :/ I will happily add back all the samples to the doc (it won't be that bad to just grab it from my previous commit), and set some sort of Jira ticket to help me remember to ask about any maintenance issues that may come up. The reason is because we don't want to send customers off the docs pages to GitHub and then rely on them to come back. The idea is its better to keep them in once place if possible so for that reason its expected we just put any upstream doc in the downstream doc if the question is whether or not to link.

skrthomas avatar May 22 '24 17:05 skrthomas

There's some duplicated text between sections "Predefined Metrics" and "Network Observability metrics": I removed the duplicated text from the Network Observability metrics section. Its supposed to only be in Predefined Metrics.

skrthomas avatar May 22 '24 20:05 skrthomas

@memodi when you have a chance, can you PTAnotherL at this PR? I addressed your comments.

skrthomas avatar May 22 '24 20:05 skrthomas

@skrthomas: This pull request references OSDOCS-9480 which is a valid jira issue.

In response to this:

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9480 Link to docs preview:

Custom Metrics conceptual info: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-custom-metrics_metrics-dashboards-alerts

Configuring custom metrics: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-configuring-custom-metrics_metrics-dashboards-alerts

Configuring custom charts: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-custom-charts-flowmetrics_metrics-dashboards-alerts

QE review:

  • [ ] QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar May 22 '24 20:05 openshift-ci-robot

@skrthomas: This pull request references OSDOCS-9480 which is a valid jira issue.

In response to this:

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9480 Link to docs preview:

Custom Metrics conceptual info: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-custom-metrics_metrics-dashboards-alerts

Configuring custom metrics: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-configuring-custom-metrics_metrics-dashboards-alerts

Configuring custom charts: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-custom-charts-flowmetrics_metrics-dashboards-alerts

QE review:

  • [ ] QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar May 28 '24 18:05 openshift-ci-robot

@skrthomas: This pull request references OSDOCS-9480 which is a valid jira issue.

In response to this:

Merge to only the no-1.6 branch - no cherrypicks are required. This PR is part of an experiment for simplifying merges for asynchronous content, and I will open one PR against main to incorporate all of the Network Observability 1.6 content just before its GA

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9480 Link to docs preview:

Custom Metrics conceptual info: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-custom-metrics_metrics-dashboards-alerts

Configuring custom metrics: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-configuring-custom-metrics_metrics-dashboards-alerts

Configuring custom charts: https://74412--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/metrics-alerts-dashboards.html#network-observability-custom-charts-flowmetrics_metrics-dashboards-alerts

QE review:

  • [ ] QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar May 29 '24 01:05 openshift-ci-robot

@skrthomas: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar May 29 '24 02:05 openshift-ci[bot]