pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

[WIP] readme for optional prometheus and grafana monitoring in KFP

Open jingzhang36 opened this issue 5 years ago • 10 comments

Description of your changes:

Checklist:

  • [ ] The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

    PR titles examples:

    • fix(frontend): fixes empty page. Fixes #1234 Use fix to indicate that this PR fixes a bug.
    • feat(backend): configurable service account. Fixes #1234, fixes #1235 Use feat to indicate that this PR adds a new feature.
    • chore: set up changelog generation tools Use chore to indicate that this PR makes some changes that users don't need to know.
    • test: fix CI failure. Part of #1234 Use part of to indicate that a PR is working on an issue, but shouldn't close the issue when merged.
  • [ ] Do you want this pull request (PR) cherry-picked into the current release branch?

    If yes, use one of the following options:

    • (Recommended.) Ask the PR approver to add the cherrypick-approved label to this PR. The release manager adds this PR to the release branch in a batch update.
    • After this PR is merged, create a cherry-pick PR to add these changes to the release branch. (For more information about creating a cherry-pick PR, see the Kubeflow Pipelines release guide.)

jingzhang36 avatar Aug 27 '20 09:08 jingzhang36

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign bobgy You can assign the PR to them by writing /assign @bobgy in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Aug 27 '20 09:08 k8s-ci-robot

This change is Reviewable

kubeflow-bot avatar Aug 27 '20 09:08 kubeflow-bot

@jingzhang36: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
kubeflow-pipeline-e2e-test 5fc09bd0d5aa9f8c8685b08abdaed766ec5aa8bf link /test kubeflow-pipeline-e2e-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Aug 27 '20 09:08 k8s-ci-robot

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 26 '20 21:11 stale[bot]

What would still need to be added to complete this PR?

davidspek avatar Nov 27 '20 12:11 davidspek

@DavidSpek we need instructions for how to configure prometheus to collect KFP metrics. The metrics have been added

Bobgy avatar Nov 28 '20 02:11 Bobgy

@Bobgy @zijianjoy Am I correct in assuming the workflow-controller-metrics service is supposed to be monitored here? I'm getting RBAC access denied (also from a notebook server using curl) so I'll need to sort that out. Are there more endpoints that prometheus should monitor? Are these endpoints meant to expose data of a namespace to a user or are they metrics for the entire cluster?

davidspek avatar Apr 21 '21 11:04 davidspek

ml-pipeline service is monitored here, it only expose metrics about the entire cluster for now. Welcome new requests to add metrics

Bobgy avatar Apr 25 '21 05:04 Bobgy

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 02 '22 10:03 stale[bot]

Is this PR relevant? +30 commits? and opened since Mar 2022?

rimolive avatar Mar 07 '24 20:03 rimolive