[WIP] readme for optional prometheus and grafana monitoring in KFP
Description of your changes:
Checklist:
-
[ ] The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.
PR titles examples:
fix(frontend): fixes empty page. Fixes #1234Usefixto indicate that this PR fixes a bug.feat(backend): configurable service account. Fixes #1234, fixes #1235Usefeatto indicate that this PR adds a new feature.chore: set up changelog generation toolsUsechoreto indicate that this PR makes some changes that users don't need to know.test: fix CI failure. Part of #1234Usepart ofto indicate that a PR is working on an issue, but shouldn't close the issue when merged.
-
[ ] Do you want this pull request (PR) cherry-picked into the current release branch?
If yes, use one of the following options:
- (Recommended.) Ask the PR approver to add the
cherrypick-approvedlabel to this PR. The release manager adds this PR to the release branch in a batch update. - After this PR is merged, create a cherry-pick PR to add these changes to the release branch. (For more information about creating a cherry-pick PR, see the Kubeflow Pipelines release guide.)
- (Recommended.) Ask the PR approver to add the
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
To complete the pull request process, please assign bobgy
You can assign the PR to them by writing /assign @bobgy in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@jingzhang36: The following test failed, say /retest to rerun all failed tests:
| Test name | Commit | Details | Rerun command |
|---|---|---|---|
| kubeflow-pipeline-e2e-test | 5fc09bd0d5aa9f8c8685b08abdaed766ec5aa8bf | link | /test kubeflow-pipeline-e2e-test |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
What would still need to be added to complete this PR?
@DavidSpek we need instructions for how to configure prometheus to collect KFP metrics. The metrics have been added
@Bobgy @zijianjoy Am I correct in assuming the workflow-controller-metrics service is supposed to be monitored here? I'm getting RBAC access denied (also from a notebook server using curl) so I'll need to sort that out. Are there more endpoints that prometheus should monitor? Are these endpoints meant to expose data of a namespace to a user or are they metrics for the entire cluster?
ml-pipeline service is monitored here, it only expose metrics about the entire cluster for now. Welcome new requests to add metrics
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Is this PR relevant? +30 commits? and opened since Mar 2022?