admiral
admiral copied to clipboard
[BUG] The GlobalTrafficPolicy doesn't failover when weights declared
Describe the bug If the weight is declared, the 10 times of consecutive5xxErrors won't failover to the other region
Steps To Reproduce
apiVersion: admiral.io/v1alpha1
kind: GlobalTrafficPolicy
metadata:
name: gtp-admiral-sample
namespace: sample-admiral
labels:
env: default
identity: webapp-sample-admiral
spec:
policy:
- dns: default.webapp-sample-admiral.global
lbType: 1 #0 represents TOPOLOGY, 1 represents FAILOVER
target:
- region: us-west-2
weight: 10
- region: us-east-1
weight: 90
Expected behavior If a service returns 10 times 500, it won't get kicked off when GTP Weight(90 / 10 ) applied.
Without GTP, the failover will work with 10 consecutive 500 errors
@wenxian This could be an istio issue, but I remember this was tested at least in istio version 1.5.x
i) What istio version are you using?
ii) Can you paste the destination rule generated after applying this GTP?
we are using istio 1.6
Namespace: admiral-sync
Labels: <none>
Annotations: <none>
API Version: networking.istio.io/v1beta1
Kind: DestinationRule
Metadata:
Creation Timestamp: 2020-08-10T21:22:01Z
Generation: 10
Resource Version: 170254904
Self Link: /apis/networking.istio.io/v1beta1/namespaces/admiral-sync/destinationrules/default.greeting-sample-showgtp.global-default-dr
UID: 83651bea-145a-4fdc-8efb-92601c695c76
Spec:
Host: default.greeting-sample-showgtp.global
Traffic Policy:
Load Balancer:
Locality Lb Setting:
Distribute:
From: us-east-1/*
To:
us-east-1: 99 #50
us-west-2: 1 #50
Simple: ROUND_ROBIN
Outlier Detection:
Base Ejection Time: 120s
consecutive5xxErrors: 10
Interval: 5s
Tls:
Mode: ISTIO_MUTUAL
Events: <none>
I am calling from the us-east-1, actually, i found as long as the local (us-east-1) >= 50, the call always in local (us-east-1) which means 1. the weight doesn't got applied. (10 of 10 in east) 2. it won't fail over to remote (west).
@wenxian I see the destination rule has been generated with the correct weights as per the spec apparently the distribute sets weights. Outlier detection might not be used here.
Probably looking at the envoy clusters night help, can you share the output for the following command:
istioctl proxy-config clusters <pod_name_of_source_workload> -o json
us-east-1
"name": "outbound|80||default.greeting-sample-showgtp.global",
"type": "STRICT_DNS",
"connectTimeout": "10s",
"loadAssignment": {
"clusterName": "outbound|80||default.greeting-sample-showgtp.global",
"endpoints": [
{
"locality": {
"region": "us-east-1"
},
"lbEndpoints": [
{
"endpoint": {
"address": {
"socketAddress": {
"address": "greeting.sample-showgtp.svc.cluster.local",
"portValue": 80
}
}
},
"loadBalancingWeight": 1
}
],
"loadBalancingWeight": 50
},
{
"locality": {
"region": "us-west-2"
},
"lbEndpoints": [
{
"endpoint": {
"address": {
"socketAddress": {
"address": "a5020c7e4380642f09c42334f5d06314-b30f0b24ce995299.elb.us-west-2.amazonaws.com",
"portValue": 15443
}
}
},
"loadBalancingWeight": 1
}
],
"loadBalancingWeight": 50
}
]
},
"circuitBreakers": {
"thresholds": [
{
"maxConnections": 4294967295,
"maxPendingRequests": 4294967295,
"maxRequests": 4294967295,
"maxRetries": 4294967295
}
]
},
"dnsRefreshRate": "5s",
"respectDnsTtl": true,
"dnsLookupFamily": "V4_ONLY",
"outlierDetection": {
"consecutive5xx": 10,
"interval": "5s",
"baseEjectionTime": "120s",
"enforcingConsecutive5xx": 100
},
"commonLbConfig": {
"healthyPanicThreshold": {},
"localityWeightedLbConfig": {}
},
"transportSocket": {
"name": "envoy.transport_sockets.tls",
"typedConfig": {
"@type": "type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext",
"commonTlsContext": {
"tlsCertificateSdsSecretConfigs": [
{
"name": "default",
"sdsConfig": {
"apiConfigSource": {
"apiType": "GRPC",
"grpcServices": [
{
"envoyGrpc": {
"clusterName": "sds-grpc"
}
}
]
}
}
}
],
"combinedValidationContext": {
"defaultValidationContext": {},
"validationContextSdsSecretConfig": {
"name": "ROOTCA",
"sdsConfig": {
"apiConfigSource": {
"apiType": "GRPC",
"grpcServices": [
{
"envoyGrpc": {
"clusterName": "sds-grpc"
}
}
]
--
"sni": "outbound_.80_._.default.greeting-sample-showgtp.global"
}
},
"metadata": {
"filterMetadata": {
"istio": {
"config": "/apis/networking.istio.io/v1alpha3/namespaces/admiral-sync/destination-rule/default.greeting-sample-showgtp.global-default-dr"
}
}
},
"filters": [
{
"name": "istio.metadata_exchange",
"typedConfig": {
"@type": "type.googleapis.com/udpa.type.v1.TypedStruct",
"typeUrl": "type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange",
"value": {
"protocol": "istio-peer-exchange"
}
}
}
]
},
I have a set up us-east-1 (admiral server and admiral remote) us-west-2 (admiral remote), actually i see the 50/50 distribute works in west but not in the east.
The east cluster goes to west (the LB), but looks like the west LB still returns the east response. So finally it looks like always in the east
us-west-2
"name": "outbound|80||default.greeting-sample-showgtp.global",
"type": "STRICT_DNS",
"connectTimeout": "10s",
"loadAssignment": {
"clusterName": "outbound|80||default.greeting-sample-showgtp.global",
"endpoints": [
{
"locality": {
"region": "us-east-1"
},
"lbEndpoints": [
{
"endpoint": {
"address": {
"socketAddress": {
"address": "a4e692a23991b478ca62ea84881d79da-53c356a7441bc499.elb.us-east-1.amazonaws.com",
"portValue": 15443
}
}
},
"loadBalancingWeight": 1
}
],
"loadBalancingWeight": 50
},
{
"locality": {
"region": "us-west-2"
},
"lbEndpoints": [
{
"endpoint": {
"address": {
"socketAddress": {
"address": "greeting.sample-showgtp.svc.cluster.local",
"portValue": 80
}
}
},
"loadBalancingWeight": 1
}
],
"loadBalancingWeight": 50
}
]
},
"circuitBreakers": {
"thresholds": [
{
"maxConnections": 4294967295,
"maxPendingRequests": 4294967295,
"maxRequests": 4294967295,
"maxRetries": 4294967295
}
]
},
"dnsRefreshRate": "5s",
"respectDnsTtl": true,
"dnsLookupFamily": "V4_ONLY",
"outlierDetection": {
"consecutive5xx": 10,
"interval": "5s",
"baseEjectionTime": "120s",
"enforcingConsecutive5xx": 100
},
"commonLbConfig": {
"healthyPanicThreshold": {},
"localityWeightedLbConfig": {}
},
"transportSocket": {
"name": "envoy.transport_sockets.tls",
"typedConfig": {
"@type": "type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext",
"commonTlsContext": {
"tlsCertificateSdsSecretConfigs": [
{
"name": "default",
"sdsConfig": {
"apiConfigSource": {
"apiType": "GRPC",
"grpcServices": [
{
"envoyGrpc": {
"clusterName": "sds-grpc"
}
}
]
}
}
}
],
"combinedValidationContext": {
"defaultValidationContext": {},
"validationContextSdsSecretConfig": {
"name": "ROOTCA",
"sdsConfig": {
"apiConfigSource": {
"apiType": "GRPC",
"grpcServices": [
{
"envoyGrpc": {
"clusterName": "sds-grpc"
}
}
]
--
"sni": "outbound_.80_._.default.greeting-sample-showgtp.global"
}
},
"metadata": {
"filterMetadata": {
"istio": {
"config": "/apis/networking.istio.io/v1alpha3/namespaces/admiral-sync/destination-rule/default.greeting-sample-showgtp.global-default-dr"
}
}
},
"filters": [
{
"name": "istio.metadata_exchange",
"typedConfig": {
"@type": "type.googleapis.com/udpa.type.v1.TypedStruct",
"typeUrl": "type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange",
"value": {
"protocol": "istio-peer-exchange"
}
}
}
]
},
curl -HHost:default.greeting-sample-showgtp.global a5020c7e4380642f09c42334f5d06314-b30f0b24ce995299.elb.us-west-2.amazonaws.com
always returns the east answer
--- UPDATE --- Found that if the west cluster has more weights, then the request from west will always be in west cluster.
(US-EAST-1 >= 50, US-WEST-2) -> Request from East will always return East, West is good (US-WEST-2> 50, US-EAST-1) -> Request from West will always return West, East is good
This means if the cluster (locality) has more weight, it could result in the requests from its own cluster fall in its cluster always. (Because the LB always resolves to its own cluster)