origin MON-1157: Revive prometheus metrics best practices

trafficstars

Feb 28 '24 14:02 jan--f

@jan--f: This pull request references MON-1157 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Feb 28 '24 14:02 openshift-ci-robot

/cc @dgrisonnet

Feb 28 '24 15:02 jan--f

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/extended/prometheus/OWNERS~~ [jan--f]
~~test/extended/util/prometheus/OWNERS~~ [jan--f]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Feb 28 '24 15:02 openshift-ci[bot]

Fwiw this is for now just an attempt to revive code that Damien already wrote. I have not checked if revived code make sense in the current test setup. This might need significantly more work.

Feb 28 '24 15:02 jan--f

The initial concern with that test was that it might create disruption in CI if not done properly. I don't really know what would be the safest way to introduce it, but at least you would need to update the exception list and then maybe make it flaky for a bit, or notify TRT to have them watch the test and revert in case something bad happens.

Another concern was that e2e tests might not catch all the metrics & labels since it depends on whether the scenario that triggers the generation of a particular timeserie is tested or not. But having at least some of the metrics checked is already better than nothing, so I don't think it make sense anymore.

Feb 28 '24 15:02 dgrisonnet

@jan--f: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-ovn-rt-upgrade	c006ed9d393a090e61feba70159bb5fbd410da1f	link	false	`/test e2e-gcp-ovn-rt-upgrade`
ci/prow/verify	c006ed9d393a090e61feba70159bb5fbd410da1f	link	true	`/test verify`
ci/prow/e2e-gcp-csi	c006ed9d393a090e61feba70159bb5fbd410da1f	link	false	`/test e2e-gcp-csi`
ci/prow/e2e-metal-ipi-sdn	c006ed9d393a090e61feba70159bb5fbd410da1f	link	false	`/test e2e-metal-ipi-sdn`
ci/prow/e2e-agnostic-ovn-cmd	c006ed9d393a090e61feba70159bb5fbd410da1f	link	false	`/test e2e-agnostic-ovn-cmd`
ci/prow/e2e-gcp-ovn	c006ed9d393a090e61feba70159bb5fbd410da1f	link	true	`/test e2e-gcp-ovn`
ci/prow/e2e-aws-ovn-fips	c006ed9d393a090e61feba70159bb5fbd410da1f	link	true	`/test e2e-aws-ovn-fips`
ci/prow/e2e-aws-ovn-single-node	c006ed9d393a090e61feba70159bb5fbd410da1f	link	false	`/test e2e-aws-ovn-single-node`
ci/prow/e2e-metal-ipi-ovn-ipv6	c006ed9d393a090e61feba70159bb5fbd410da1f	link	true	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/e2e-aws-ovn-cgroupsv2	c006ed9d393a090e61feba70159bb5fbd410da1f	link	false	`/test e2e-aws-ovn-cgroupsv2`
ci/prow/e2e-aws-ovn-single-node-serial	c006ed9d393a090e61feba70159bb5fbd410da1f	link	false	`/test e2e-aws-ovn-single-node-serial`
ci/prow/e2e-openstack-ovn	c006ed9d393a090e61feba70159bb5fbd410da1f	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-aws-ovn-single-node-upgrade	c006ed9d393a090e61feba70159bb5fbd410da1f	link	false	`/test e2e-aws-ovn-single-node-upgrade`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Feb 28 '24 17:02 openshift-ci[bot]

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

May 29 '24 01:05 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

Jun 28 '24 08:06 openshift-bot

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Jul 29 '24 00:07 openshift-bot

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jul 29 '24 00:07 openshift-ci[bot]

origin origin copied to clipboard

MON-1157: Revive prometheus metrics best practices

origin
origin copied to clipboard