metrics-server icon indicating copy to clipboard operation
metrics-server copied to clipboard

Optimization decodeBatch

Open yangjunmyfm192085 opened this issue 3 years ago • 4 comments

What this PR does / why we need it: When a pod start/stop occurs in the cluster, it will not cause data scrape failure of the metrics-server Currently, when data obtained from the kubelet's /metrics/resource endpoint to be parsed, if parse an entry fails, the whole data scrape will fail. We expect abnormal entries to be skipped, do not affect the data scrape of other nodes/pods Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #1017

yangjunmyfm192085 avatar Jul 11 '22 02:07 yangjunmyfm192085

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yangjunmyfm192085 To complete the pull request process, please assign s-urbaniak after the PR has been reviewed. You can assign the PR to them by writing /assign @s-urbaniak in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 11 '22 02:07 k8s-ci-robot

We expect abnormal entries to be skipped, do not affect the data scrape of other nodes/pods

Don't agree, if we find an abnormal entry it means that there was something wrong on network/kubelet side. If so trying to continue parsing can only result in corrupting the current state.

It's better to fail and inform the user than silently fail and try to utilize corrupted data.

serathius avatar Jul 12 '22 20:07 serathius

We expect abnormal entries to be skipped, do not affect the data scrape of other nodes/pods

Don't agree, if we find an abnormal entry it means that there was something wrong on network/kubelet side. If so trying to continue parsing can only result in corrupting the current state.

It's better to fail and inform the user than silently fail and try to utilize corrupted data.

All right. It seems that we need to solve the situation of negative value reported by timestamp from kubelet

yangjunmyfm192085 avatar Jul 13 '22 12:07 yangjunmyfm192085

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 11 '22 13:10 k8s-triage-robot

/assign

logicalhan avatar Nov 03 '22 16:11 logicalhan

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 03 '22 17:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Jan 02 '23 18:01 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 02 '23 18:01 k8s-ci-robot