When high concurrency of requests, Gateway cause the backend server exception "Unexpected EOF read on the socket."
Describe the bug My prodution environment
2 Gateway Servers(8 cpu core 16G memory) version
- JDK: 21
- spring boot: 3.2.7
- spring cloud: 2023.0.1
6 API-SERVERs based on Tomact(4 cpu core 8G memory)
- jdk 8
- spring cloud:Finchley.SR2
- spring boot: 2.0.6
6 of API-Servers receive request by Gateway servers, and then handle the request asyncly and return success right now.
@RestController
@RestResponseBody
public class ApiController {
@Autowired
private AsyncHandleComponent asyncHandleComponent;
@PostMapping("/api")
public String api(@RequestBody ReportModel reportModel, @RequestHeader("IGG-PROXY-IP") String ip) {
asyncHandleComponent.handleApi(reportModel, ip);// async pool
return "success";
}
}
2 of Gateway Server just receives request by Nginx server,and have a modify requestbody filter,and then routes to API-Servers. Gateway config
spring:
codec:
gateway:
default-filters:
- StripPrefix=1
httpclient:
pool:
max-idle-time: 10s
eviction-interval: 20s
type: fixed
max-connections: 2048
connect-timeout: 2000
response-timeout: 3s
routes:
- id: igg-report-st
uri: lb://igg-report-common
predicates:
- Path=/st/**
modify requestbody filter code:
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
ServerRequest serverRequest = ServerRequest.create(exchange,
codecConfigurer.getReaders());
Mono<String> modifiedBody = serverRequest.bodyToMono(String.class)
.flatMap(o ->{
//log.info("修改前:" + o);
return checkParam(o);
});
BodyInserter bodyInserter = BodyInserters.fromPublisher(modifiedBody,
String.class);
HttpHeaders headers = new HttpHeaders();
headers.putAll(exchange.getRequest().getHeaders());
// the new content type will be computed by bodyInserter
// and then set in the request decorator
headers.remove(HttpHeaders.CONTENT_LENGTH);
// if the body is changing content types, set it here, to the bodyInserter
// will know about it
CachedBodyOutputMessage outputMessage = new CachedBodyOutputMessage(
exchange, headers);
return bodyInserter.insert(outputMessage, new BodyInserterContext())
.then(Mono.defer(() -> {
ServerHttpRequest decorator = decorate(exchange, headers,
outputMessage);
return chain
.filter(exchange.mutate().request(decorator).build());
})).onErrorResume((throwable) -> {
return release(exchange, outputMessage, (Throwable) throwable);
});
}
protected Mono<Void> release(ServerWebExchange exchange, CachedBodyOutputMessage outputMessage, Throwable throwable) {
// log.info(throwable.getMessage());
return outputMessage.getBody().onErrorResume(throwable1 -> {
return Mono.error(throwable);
}).map(DataBufferUtils::release)
.then(Mono.error(throwable));
}
private Mono<String> checkParam(String bodyStr) {
try {
HashMap<String, Object> param = objectMapper.readValue(bodyStr, HashMap.class);
ParamProperties.KeyMap keyMapProperties = paramProperties.getKeyMap().get(param.get("v"));
Map<String,String> keyMap;
if(keyMapProperties == null){
keyMap = AppInfoSchedular.paramKeyMap.get(param.get("v") + "");
}else{
keyMap = keyMapProperties.getKey();
}
if(keyMap == null || keyMap.size() == 0){
return Mono.error(new GatewayException(GatewayExceptionCode.BODY_EMPTY));
}
for(String key : keyMap.keySet()){
bodyStr = bodyStr.replace("\""+key+"\"","\""+keyMap.get(key) + "\"");
}
return Mono.just(bodyStr);
} catch (IOException e) {
return Mono.error(new GatewayException(GatewayExceptionCode.BODY_EMPTY,"parse"));
}
}
My Problem When High concurrency request comes,suddenly increasing from 10K to 50k+ and keeping this status about 10-20 sec. Nginx log so many 502.And I check the Gateway log and api-server Log and CPU.
Gateway log
2024-07-09 05:00:09.828 ERROR {[]} [ parallel-4] i.r.g.s.e.IggExceptionHandler: 504 GATEWAY_TIMEOUT "Response took longer than timeout: PT3S"; nested exception is org.springframework.cloud.gateway.support.TimeoutException: Response took longer than timeout: PT3S
org.springframework.web.server.ResponseStatusException: 504 GATEWAY_TIMEOUT "Response took longer than timeout: PT3S"; nested exception is org.springframework.cloud.gateway.support.TimeoutException: Response took longer than timeout: PT3S
at org.springframework.cloud.gateway.filter.NettyRoutingFilter.lambda$filter$5(NettyRoutingFilter.java:195)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
*__checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]
*__checkpoint ⇢ HTTP POST "/kbrowser/api" [ExceptionHandlingWebHandler]
Original Stack Trace:
at org.springframework.cloud.gateway.filter.NettyRoutingFilter.lambda$filter$5(NettyRoutingFilter.java:195)
at reactor.core.publisher.Flux.lambda$onErrorMap$29(Flux.java:6946)
at reactor.core.publisher.Flux.lambda$onErrorResume$30(Flux.java:6999)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94)
at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
at reactor.core.publisher.FluxTimeout$TimeoutOtherSubscriber.onError(FluxTimeout.java:341)
at reactor.core.publisher.Operators.error(Operators.java:198)
at reactor.core.publisher.MonoError.subscribe(MonoError.java:53)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:301)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:280)
at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:419)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:271)
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:286)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.springframework.cloud.gateway.support.TimeoutException: Response took longer than timeout: PT3S
Gateway cpu
API-SERVER logs
API-SERVER CPU
If I expand the number of gateway server from 2 to 4, these errors disappear. It means When the gateway CPU is under high load, the gateway sends the request body very slowly, causing all threads of the backend api-server to be occupied while reading the request body information. Consequently, the backend api-server is also completely unable to provide external services. I think it should not cause api-server "Unexcepted EOF read socket",It is a bug?