Merging implementations of Strict and Logical DNS
Commit Message: Additional Description:
The main idea here is tackling #37145 (which is a follow up from #36353).
This approach simplifies the code merging the two current implementation of DNS cluster (logical and strict) by removing unnecessary logical DNS, as it can be treated as a special case of strict DNS.
According to the documentation:
When using strict DNS service discovery, Envoy will continuously and asynchronously resolve the specified DNS targets. Each returned IP address in the DNS result will be considered an explicit host in the upstream cluster. This means that if the query returns three IP addresses, Envoy will assume the cluster has three hosts, and all three should be load balanced to. If a host is removed from the result Envoy assumes it no longer exists and will drain traffic from any existing connection pools
Logical DNS uses a similar asynchronous resolution mechanism to strict DNS. However, instead of strictly taking the results of the DNS query and assuming that they comprise the entire upstream cluster, a logical DNS cluster only uses the first IP address returned when a new connection needs to be initiated. Thus, a single logical connection pool may contain physical connections to a variety of different upstream hosts. Connections are never drained, including on a successful DNS resolution that returns 0 hosts.
Risk Level: Testing: Docs Changes: Release Notes: Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional API Considerations:]
As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.
Please mark your PR as ready when you want it to be reviewed!
@markdroth if you're interested, as this is a follow up from #36353.
/wait Looks like CI is failing?
I'm taking over this temporarily, will take a look when I have time
A high level comment first: How does this change impact Envoy Happy-Eyeball support: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/connection_pooling#arch-overview-happy-eyeballs
@yanjunxiang-google I'm moving this to my backlog, so it my be a while before I'll take a look. Sorry about that!
/wait
Code coverage for source/extensions/clusters/common is lower than limit of 96.6 (88.7)
DnsCluster wasn't using LogicalHost, which impacted not only coverage, but also the use of Happy Eyeballs (@yanjunxiang-google). Hopefully with these most recent changes we shouldn't have these problems. I'm still testing it locally to make sure Happy Eyeballs is Happy 🙃
Is this ready for maintainer review?
@jmarantz yes, this is ready for maintainer review.
LGTM modulo CI failure
/wait
/retest
LGTM modulo CI failure
I think it was just some hiccups with integration tests. Should be good now.
LGTM
/assign-from @envoyproxy/senior-maintainers
@envoyproxy/senior-maintainers assignee is @mattklein123
@wbpcode Thanks for the thorough review. I tried to address your suggestions the best I could. I totally see the problem with logical hosts and the updates based on the first address returned in the response. I've created a unit test for that which can reproduce the issue and now is of course passing after implementing your suggestion.
@wbpcode Thanks for the thorough review. I tried to address your suggestions the best I could. I totally see the problem with logical hosts and the updates based on the first address returned in the response. I've created a unit test for that which can reproduce the issue and now is of course passing after implementing your suggestion.
Thanks for the update. And I will take a look again at weekend or next monday.
PTAL when you get a chance @wbpcode
/retest
/wait
I'm working now on including the feature flag and restoring the previous implementations.
CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).
I'll be unavailable for the next few days. There's still some details to be sorted in the tests mainly. I'll be back from the 30th of June.
/wait-any
Waiting for addressing comments and conflicts
Thanks for the update. There are lots of new commit? Is this ready for review? And we may prefer to land this after 1.35 is released (in next one week ideally).
Thanks for the update. There are lots of new commit? Is this ready for review? And we may prefer to land this after 1.35 is released (in next one week ideally).
Yes, it was a significant change. I had to re-introduce the old implementations. So there were lots of changes mostly in tests.
@wbpcode I had some issues with coverage (as I mistakenly hadn't parametrized some of the unit tests). Now CI is passing, we should be good for a review again. Thank you.
gentle ping @wbpcode