stats/otel: Add subchannel metrics (A94)
Addresses : https://github.com/grpc/proposal/blob/master/A94-subchannel-otel-metrics.md
this PR adds sub channel metrics with applicable labels as per the RFC proposal.
RELEASE NOTES:
- stats/otel: add subchannel metrics to eventually replace the pickfirst metrics
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 83.37%. Comparing base (432bda3) to head (cc6c6e3).
:warning: Report is 8 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #8738 +/- ##
==========================================
+ Coverage 83.22% 83.37% +0.15%
==========================================
Files 419 418 -1
Lines 32454 32425 -29
==========================================
+ Hits 27009 27034 +25
+ Misses 4057 4010 -47
+ Partials 1388 1381 -7
| Files with missing lines | Coverage Δ | |
|---|---|---|
| clientconn.go | 90.44% <100.00%> (-0.06%) |
:arrow_down: |
| internal/internal.go | 100.00% <ø> (ø) |
|
| internal/transport/http2_client.go | 92.80% <100.00%> (+0.18%) |
:arrow_up: |
| internal/transport/transport.go | 90.86% <ø> (+2.15%) |
:arrow_up: |
| internal/xds/xds.go | 80.55% <100.00%> (+5.55%) |
:arrow_up: |
| stream.go | 81.88% <100.00%> (+0.04%) |
:arrow_up: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@arjan-bal
A94 states the following:
If we end up discarding connection attempts as we do with the “happy eyeballs” algorithm
(as per A61), we should not record the connection attempt or the disconnection.
How are we currently handling this in the pickfirst metrics?
@mbissa
A94 states the following:
Implementations that have already implemented the pick-first metrics should give
enough time for users to transition to the new metrics. For example, implementations
should report both the old pick-first metrics and the new subchannel metrics for
2 releases, and then remove the old pick-first metrics.
Can you please ensure that we have an issue filed to track the removal of the old metrics and that it captures the correct release where it needs to be removed.
@arjan-bal
A94 states the following:
If we end up discarding connection attempts as we do with the “happy eyeballs” algorithm (as per A61), we should not record the connection attempt or the disconnection.How are we currently handling this in the pickfirst metrics?
Here is how pickfirst handles this:
- When a subchannel connects, all remaining subchannels are removed from pickfirst's subchannel map. https://github.com/grpc/grpc-go/blob/647162c379ddeb85f45c6c43e9274662d2d1e2d6/balancer/pickfirst/pickfirst.go#L603-L605
- The updateSubConnState method ignores any updates from subchannels that are not in the subchannel map. https://github.com/grpc/grpc-go/blob/647162c379ddeb85f45c6c43e9274662d2d1e2d6/balancer/pickfirst/pickfirst.go#L585-L591
@mbissa
A94 states the following:
Implementations that have already implemented the pick-first metrics should give enough time for users to transition to the new metrics. For example, implementations should report both the old pick-first metrics and the new subchannel metrics for 2 releases, and then remove the old pick-first metrics.Can you please ensure that we have an issue filed to track the removal of the old metrics and that it captures the correct release where it needs to be removed.
Issue filed - https://github.com/grpc/grpc-go/issues/8752
@arjan-bal
A94 states the following:
If we end up discarding connection attempts as we do with the “happy eyeballs” algorithm (as per A61), we should not record the connection attempt or the disconnection.How are we currently handling this in the pickfirst metrics?
Handled as per PR
Looks good mostly. Just some minor nits.
Can you also please update the PR description to note that the disconnection_reason will be plumbed in a follow-up PR. Thanks.
done.