pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

add namespace label/tag to non-deprecated throttle metrics

Open gabemontero opened this issue 1 year ago • 23 comments

Changes

Back when implementing https://github.com/tektoncd/pipeline/pull/6744 for https://github.com/tektoncd/pipeline/issues/6631 we failed to realize that with k8s quota policies being namespace scoped, knowing which namespace the throttled items were in could have some diagnostic value.

Now that we have been using the metric added for a bit, this realization is now very apparent.

This changes introduces the namespace tag. Also, since last touching this space, the original metric was deprecated and a new one with a shorter name was added. This change only updates the non-deprecated metric with the new label.

Lastly, the default behavior is preserved, and use of the new label only occurs when explicitly enabled in observability config map.

Fixes https://github.com/tektoncd/pipeline/issues/7878

/kind feature

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • [/ ] Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • [/ ] Has Tests included if any functionality added or changed
  • [ /] pre-commit Passed
  • [ /] Follows the commit message standard
  • [ /] Meets the Tekton contributor standards (including functionality, content, code)
  • [ /] Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • [ /] Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • [ /] Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Add 'namespace' label/tag to the 'tekton_pipelines_controller_running_taskruns_throttled_by_quota' and 'tekton_pipelines_controller_running_taskruns_throttled_by_node' metrics, as kubernetes quota definitions are namespace scoped, hence certain namespaces may be more susceptible to quota throttling than others, and in a multi-node environment, not all namespaces are necessarily on the same node.

To enable this new label/tag, set 'metrics.taskrun.throttle.enable-namespace' to 'true' in the 'config-observability' ConfigMap

gabemontero avatar Apr 12 '24 12:04 gabemontero

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 86.3% 1.5

tekton-robot avatar Apr 12 '24 12:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 86.3% 1.5

tekton-robot avatar Apr 12 '24 12:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 85.5% 0.7

tekton-robot avatar Apr 12 '24 13:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 85.5% 0.7

tekton-robot avatar Apr 12 '24 13:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 85.5% 0.7

tekton-robot avatar Apr 12 '24 13:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/taskrunmetrics/metrics.go 84.8% 85.5% 0.7

tekton-robot avatar Apr 12 '24 14:04 tekton-robot

ping @khrm - can you take a look when you get the chance?

also, see my question in the description wrt if adding labels to existing experimental metrics is "OK"

gabemontero avatar Apr 16 '24 13:04 gabemontero

We can add this behind a flag also. Like we do for count.reason

Ah you mean https://github.com/tektoncd/pipeline/blob/main/docs/metrics.md#configuring-metrics-using-config-observability-configmap

Before I make the change, how about an early review on the name .... say

metrics.taskrun.throttle.enable-namespace

for the new flag's name @khrm ?

Thanks for the pointer.

gabemontero avatar Apr 16 '24 20:04 gabemontero

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 85.7% 1.0

tekton-robot avatar Apr 23 '24 15:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 85.7% 1.0

tekton-robot avatar Apr 23 '24 15:04 tekton-robot

ok @khrm new flag to enable the namespace label on the throttle metric is added ... PTAL

gabemontero avatar Apr 23 '24 16:04 gabemontero

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 85.7% 1.0

tekton-robot avatar Apr 24 '24 19:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 85.7% 1.0

tekton-robot avatar Apr 24 '24 19:04 tekton-robot

Thanks @gabemontero /lgtm

likewise thanks @afrittoli

question to you and @khrm - any suggestion on who I should reach out to in order to get the approve label (minimally assuming @khrm you are fine with my updates from your comments) ?

gabemontero avatar Apr 29 '24 11:04 gabemontero


Yes, this is fine.

khrm avatar Apr 29 '24 16:04 khrm

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 86.1% 1.3

tekton-robot avatar Apr 30 '24 18:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 86.1% 1.3

tekton-robot avatar Apr 30 '24 18:04 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 86.1% 1.3

tekton-robot avatar May 01 '24 01:05 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/metrics.go 73.7% 76.2% 2.5
pkg/taskrunmetrics/metrics.go 84.8% 86.1% 1.3

tekton-robot avatar May 01 '24 01:05 tekton-robot

unrelated flake on alpha integration

/test pull-tekton-pipeline-alpha-integration-tests

gabemontero avatar May 01 '24 12:05 gabemontero

@afrittoli @khrm PTAL at the updates from @khrm 's last review - thanks

gabemontero avatar May 01 '24 14:05 gabemontero

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, khrm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot avatar May 08 '24 13:05 tekton-robot

/lgtm

chitrangpatel avatar May 10 '24 16:05 chitrangpatel