etcd icon indicating copy to clipboard operation
etcd copied to clipboard

server,tests: add additional lease metrics and test

Open vivekpatani opened this issue 1 year ago • 3 comments
trafficstars

  • metrics to capture leases attached and detached
  • metrics to capture duration to grant, revoke, and renew leases
  • metric to capture initial lease count at startup

Help

  • Primarily I'd like to get feedback, I think these metrics can be useful, especially under heavy load.
  • Need help with the last part of the testing, where I try to capture the count the count at initial startup, please let me know if my understanding is wanting in this case. More specifically - this.
    • Currently that part of the test does not pass.
    • My understanding is that when the cluster recovers the lease should exist in the database and should reflect in terms of metric and LeaseLeases response.

vivekpatani avatar Oct 09 '24 22:10 vivekpatani

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vivekpatani Once this PR has been reviewed and has the lgtm label, please assign jmhbnz for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Oct 09 '24 22:10 k8s-ci-robot

Hi @vivekpatani. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Oct 09 '24 22:10 k8s-ci-robot

Codecov Report

Attention: Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 68.80%. Comparing base (995027f) to head (97e634a). Report is 713 commits behind head on main.

:exclamation: Current head 97e634a differs from pull request most recent head ba35d47

Please upload reports for the commit ba35d47 to get more accurate results.

Files with missing lines Patch % Lines
server/lease/lessor.go 70.00% 3 Missing :warning:
Additional details and impacted files
Files with missing lines Coverage Δ
server/lease/metrics.go 100.00% <100.00%> (ø)
server/lease/lessor.go 88.83% <70.00%> (-0.50%) :arrow_down:

... and 19 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18711      +/-   ##
==========================================
+ Coverage   68.79%   68.80%   +0.01%     
==========================================
  Files         420      420              
  Lines       35523    35538      +15     
==========================================
+ Hits        24437    24453      +16     
+ Misses       9658     9655       -3     
- Partials     1428     1430       +2     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 995027f...ba35d47. Read the comment docs.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Oct 09 '24 23:10 codecov-commenter

I'm really confused about the motivation behind those metrics. Instead of trying to add a metric for everything just in case, can you describe what do you want to achieve? We can discuss a design a metric per each use case.

serathius avatar Nov 21 '24 08:11 serathius

Sorry for the confusion.

We observed a lot of lease churn in the recent past, to get to the bottom of this, metrics are helpful.

For intuition

  • error based metrics - Trying to see what kind of errors we see when the lease churn is high, and what exactly are we hitting would be helpful.
  • leaseAttachAndDetach - Creating the leases is one thing, and having it attached to a resource, are separate operations, fine grained metrics are helpful to see if the lease got attached to the resource that it was intended for.
  • Duration based metrics - as you said, these are available from the existing gRPC metrics. So not needed.

I'm open to the idea on how to implement this any better or reuse existing metrics to derive these. Thanks for taking a look @serathius.

vivekpatani avatar Nov 21 '24 23:11 vivekpatani

Bump @serathius or @ahrtr, thanks.^

vivekpatani avatar Dec 02 '24 20:12 vivekpatani

Trying to see what kind of errors we see when the lease churn is high, and what exactly are we hitting would be helpful.

Error metrics LeaseGrant, LeaseRevoke, LeaseRefresh should also be available in the QPS metrics.

fine grained metrics are helpful to see if the lease got attached to the resource that it was intended for.

How would you know that by metric?

serathius avatar Dec 04 '24 19:12 serathius

@serathius

How would you know that by metric?

I think there are two ways to go about it:

  1. What @ahrtr suggested here
  2. OR we can just have a simpler metric that shows us how many leases are attached/detached as suggested here

vivekpatani avatar Dec 09 '24 22:12 vivekpatani

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jan 30 '25 01:01 k8s-ci-robot

Clean up, seems like not needed/did not hear back.

vivekpatani avatar Mar 14 '25 18:03 vivekpatani