envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Updating LbEndpoint metadata causes connection churn

Open thejas-stripe opened this issue 1 year ago • 6 comments

If you are reporting any crash or any potential security issue, do not open an issue in this repo. Please report the issue via emailing [email protected] where the issue will be triaged appropriately.

Title: Updating LbEndpoint metadata causes connection churn

Description: When envoy receives a ClusterLoadAssignment with same endpoints but with updated metadata , envoy tearsdown existing connection and re-establishes a new one with all the endpoints. Is this the expected ? Or is there a relation with any other configuration which causes this connection churn ? [optional Relevant Links:]

Any extra documentation required to understand the issue.

thejas-stripe avatar Jul 02 '24 19:07 thejas-stripe

Is this LbEndpoint metadata or LocalityLbEndpoints?

htuch avatar Jul 02 '24 22:07 htuch

@htuch Its LbEndpoint metadata - https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto#config-endpoint-v3-lbendpoint

I'm updating the filter_metadata in the metadata structure, which causes connection churn.

thejas-stripe avatar Jul 03 '24 15:07 thejas-stripe

The metada value is similar to this

old

"metadata": {
           "filter_metadata": {
            "envoy.lb": {
             "filed1": "value1",
            }
           }

Updated metadata will look like this

"metadata": {
           "filter_metadata": {
            "envoy.lb": {
             "filed1": "value1",
             "field2": "value2"
            }
           }

thejas-stripe avatar Jul 03 '24 15:07 thejas-stripe

@cpakulski @adisuissa

htuch avatar Jul 04 '24 03:07 htuch

Envoy version: 1.28.3

I removed the modification to metadata in the EDS update, but that did not resolve the issue.

Digging more into this issue, it looks like the issue is caused by CDS update. Below are some envoy trace logs

[993365][debug][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:774] add/update cluster echo-srv starting warming

[993365][debug][upstream] [external/envoy/source/common/upstream/cds_api_helper.cc:51] cds: add/update cluster 'echo-srv'

[993365][debug][upstream] [external/envoy/source/common/upstream/upstream_impl.cc:1579] initializing Secondary cluster echo-srv completed

[993931][trace][upstream] [external/envoy/source/common/upstream/upstream_impl.cc:1454] Schedule destroy cluster info echo-srv

There is an CDS update at this time window.

The only change between the previous CDS and the new CDS update for echo-srv cluster shown below

old CDS

lb_subset_config: {
    fallback_policy: ANY_ENDPOINT
    subset_selectors: { keys: "field1" }
}

New CDS update has more subset config

lb_subset_config: {
    fallback_policy: ANY_ENDPOINT
    subset_selectors: [
        { keys: "field1" },
        { keys: "field2", fallback_policy: NO_FALLBACK },
        { keys: ["field1", "field2"], fallback_policy: NO_FALLBACK },
        { keys: "field3", fallback_policy: NO_FALLBACK },
        { keys: ["field1", "field3"], fallback_policy: NO_FALLBACK }
    ]
}

This is causing the creation of new cluster instance for echo-srv. It perform separate EDS query to control plane and establish new connections and the old cluster tore down causes old connections destroyed.

Do we expect creation of a new cluster and tear down of old one when there is a change to lb_subset_config ?

thejas-stripe avatar Jul 23 '24 06:07 thejas-stripe

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Aug 22 '24 16:08 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Aug 29 '24 20:08 github-actions[bot]