SCG: failing to get routing SERVICE UNAVAILABLE 503 initially after first time deployment/restart of SCG
Environemnt Spring boot: 2.6.4 spring cloud : 2021.0.1 discovery : consul
running in reactive mode.
**Load test ** We deployed upgraded SCG and trying to perform the Load test with 100 users , and initial time after first time deployment/restart of SCG we see below exceptions and users dropping . In dyna we are capturing these . And these are coming after 30s timeout ... seems like spring cloud gateway is not able to get route within 30s of response-time so it is timing out and seeing 5XX service unavailable.
100 % likely failed due to an org.springframework.cloud.gateway.support.NotFoundException 100 % likely failed due to a reactor.core.publisher.FluxOnAssembly$OnAssemblyException 33.3 % likely failed due to a com.ecwid.consul.v1.OperationException 33.3 % likely failed due to failed service calls to consul on port 8500 with HTTP 500 18.2 % likely failed due to an org.springframework.core.env.MissingRequiredPropertiesException 6.06 % likely failed due to a reactor.core.Exceptions$StaticThrowable 6.06 % likely failed due to a reactor.core.Exceptions$StaticRejectedExecutionException
further 2nd load test runs after 10 or 15 mins with same 100 users are not showing these errors...
@spencergibb could you pls help here.
thanks Karnam
I'd need quite a bit more information than that. You configuration, the full exceptions and how to reproduce to begin with
@spencergibb Thanks for the reply .
Here is the bootstrap.yml configuration:
server:
port: 8443
ssl:
key-store: ${application.ssl.keystore}
key-store-password: ${application.ssl.keystore.password}
key-store-type: ${application.ssl.keystore.type}
client-auth: ${application.ssl.keystore.type.one.way}
enabled: ${application.ssl.enabled}
key-store-provider: ${application.ssl.keystore.provider}
trust-store: ${application.ssl.truststore}
trust-store-password: ${application.ssl.truststore.password}
enabled-protocols: [TLSv1.2]
protocol: TLS
management:
trace:
http:
include: REQUEST_HEADERS, RESPONSE_HEADERS, PRINCIPAL, REMOTE_ADDRESS, TIME_TAKEN
health:
db:
enabled: false
endpoints:
enabled-by-default: false
web:
exposure:
include: gateway
endpoint:
health:
enabled: true
threaddump:
enabled: true
metrics:
enabled: true
gateway:
enabled: true
hystrix:
command:
default:
execution:
timeout:
enabled: false
logging:
level:
reactor:
netty: TRACE
org:
springframework:
cloud:
gateway: TRACE
spring:
cloud:
gateway:
httpclient:
wiretap: true
httpserver:
wiretap: true
---
spring:
application:
name: test-internal-edgeproxy
profiles:
active: commons
main:
allow-bean-definition-overriding: true
cloud:
gateway:
discovery:
locator:
enabled: true
httpclient:
connect-timeout: 90000
response-timeout: 90000
consul:
host: localhost
port: 8500
discovery:
enabled: true
health-check-interval: 30s
scheme: https
register-health-check: true
config:
enabled: true
format: ${config.server.format}
failFast: true
uri: https://testserver:8003
As I mentioned , we have hundreds of microservices behind the edge proxy (SCG) , and all requests goes thru edge proxy . with latest scg upgrade , we have deployed in SIT and we dont see this issue. But in PT load test we see some discovery issues when service requested , with multiple users (100,1000 users). This is occurring mostly after first time deployment and further tests are not generating more 5xx errors. below are exceptions few exceptions .
1.
ERROR Error getting instances from Consul.
com.ecwid.consul.v1.OperationException: OperationException(statusCode=500, statusMessage='Internal Server Error', statusContent='rpc error making call: unexpected EOF')
at com.ecwid.consul.v1.health.HealthConsulClient.getHealthServices(HealthConsulClient.java:128)
at com.ecwid.consul.v1.ConsulClient.getHealthServices(ConsulClient.java:571)
at org.springframework.cloud.consul.discovery.reactive.ConsulReactiveDiscoveryClient.getHealthServices(ConsulReactiveDiscoveryClient.java:88)
at org.springframework.cloud.consul.discovery.reactive.ConsulReactiveDiscoveryClient.lambda$getInstances$0(ConsulReactiveDiscoveryClient.java:68)
at reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:46)
at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2.
DEBUG [3ca46235-491] Resolved [NotFoundException: 503 SERVICE_UNAVAILABLE "Unable to find instance
Exception:
org.springframework.cloud.gateway.support.NotFoundException
Message:
503 SERVICE_UNAVAILABLE "Unable to find instance for test-security"
Stacktrace:
org.springframework.cloud.gateway.filter.ReactiveLoadBalancerClientFilter.lambda$
org.springframework.cloud.gateway.filter.ReactiveLoadBalancerClientFilter$$Lambda$.accept
reactor.core.publisher.FluxPeek$PeekSubscriber.onNext(FluxPeek.java:185)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:120)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:98)
reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:44)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.drainAsync(FluxFlattenIterable.java:421)
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.drain(FluxFlattenIterable.java:686)
reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber.onNext(FluxFlattenIterable.java:250)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onNext(TracingReactorSubscriber.java:70)
reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:249)
Exception:org.springframework.web.client.HttpServerErrorException$ServiceUnavailableMessage:503 Service Unavailable: [{"timestamp":"*****T*****+00:00","path":"/test1/v1/checkProcess","status":503,"error":"Service Unavailable","requestId":"*****"}]Show affected entry in 'Code level' tabStacktrace:
org.springframework.web.client.HttpServerErrorException.create(HttpServerErrorException.java:112)
org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:172)
org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:112)
3 .Exception:
reactor.core.publisher.FluxOnAssembly$OnAssemblyException
Stacktrace:
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.fail(FluxOnAssembly.java:578)
reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.onError(FluxOnAssembly.java:544)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:172)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onError(FluxHide.java:142)
reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:192)
reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:259)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onError(MonoIgnoreThen.java:270)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onError(FluxDoFinally.java:136)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.withActiveSubpath(TracingReactorSubscriber.java:102)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.onError(TracingReactorSubscriber.java:75)
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber.lambda$
com.dynatrace.agent.introspection.reactor3.TracingReactorSubscriber$$Lambda$.run
trying to replicate the same scenarios in local but not able to .
My suspect is that , gateway is waiting up to 30s from consul agent response and timed out , and leads to 503 service unavailable but not all the time. but if you see we have kept it for 90 mins .
cloud:
gateway:
discovery:
locator:
enabled: true
httpclient:
connect-timeout: 90000
response-timeout: 90s
and also we see 404 service not found errors .
pls do let me know if any.
Thanks, Karnam