origin icon indicating copy to clipboard operation
origin copied to clipboard

MON-1157: Revive prometheus metrics best practices

Open jan--f opened this issue 1 year ago • 6 comments
trafficstars

jan--f avatar Feb 28 '24 14:02 jan--f

@jan--f: This pull request references MON-1157 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Feb 28 '24 14:02 openshift-ci-robot

/cc @dgrisonnet

jan--f avatar Feb 28 '24 15:02 jan--f

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Feb 28 '24 15:02 openshift-ci[bot]

Fwiw this is for now just an attempt to revive code that Damien already wrote. I have not checked if revived code make sense in the current test setup. This might need significantly more work.

jan--f avatar Feb 28 '24 15:02 jan--f

The initial concern with that test was that it might create disruption in CI if not done properly. I don't really know what would be the safest way to introduce it, but at least you would need to update the exception list and then maybe make it flaky for a bit, or notify TRT to have them watch the test and revert in case something bad happens.

Another concern was that e2e tests might not catch all the metrics & labels since it depends on whether the scenario that triggers the generation of a particular timeserie is tested or not. But having at least some of the metrics checked is already better than nothing, so I don't think it make sense anymore.

dgrisonnet avatar Feb 28 '24 15:02 dgrisonnet

@jan--f: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn-rt-upgrade c006ed9d393a090e61feba70159bb5fbd410da1f link false /test e2e-gcp-ovn-rt-upgrade
ci/prow/verify c006ed9d393a090e61feba70159bb5fbd410da1f link true /test verify
ci/prow/e2e-gcp-csi c006ed9d393a090e61feba70159bb5fbd410da1f link false /test e2e-gcp-csi
ci/prow/e2e-metal-ipi-sdn c006ed9d393a090e61feba70159bb5fbd410da1f link false /test e2e-metal-ipi-sdn
ci/prow/e2e-agnostic-ovn-cmd c006ed9d393a090e61feba70159bb5fbd410da1f link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-gcp-ovn c006ed9d393a090e61feba70159bb5fbd410da1f link true /test e2e-gcp-ovn
ci/prow/e2e-aws-ovn-fips c006ed9d393a090e61feba70159bb5fbd410da1f link true /test e2e-aws-ovn-fips
ci/prow/e2e-aws-ovn-single-node c006ed9d393a090e61feba70159bb5fbd410da1f link false /test e2e-aws-ovn-single-node
ci/prow/e2e-metal-ipi-ovn-ipv6 c006ed9d393a090e61feba70159bb5fbd410da1f link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-aws-ovn-cgroupsv2 c006ed9d393a090e61feba70159bb5fbd410da1f link false /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-aws-ovn-single-node-serial c006ed9d393a090e61feba70159bb5fbd410da1f link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-openstack-ovn c006ed9d393a090e61feba70159bb5fbd410da1f link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node-upgrade c006ed9d393a090e61feba70159bb5fbd410da1f link false /test e2e-aws-ovn-single-node-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Feb 28 '24 17:02 openshift-ci[bot]

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar May 29 '24 01:05 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Jun 28 '24 08:06 openshift-bot

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot avatar Jul 29 '24 00:07 openshift-bot

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Jul 29 '24 00:07 openshift-ci[bot]