spring-cloud-gateway icon indicating copy to clipboard operation
spring-cloud-gateway copied to clipboard

When high concurrency of requests, Gateway cause the backend server exception "Unexpected EOF read on the socket."

Open Maplejw opened this issue 1 year ago • 0 comments

Describe the bug My prodution environment

2 Gateway Servers(8 cpu core 16G memory) version

  • JDK: 21
  • spring boot: 3.2.7
  • spring cloud: 2023.0.1

6 API-SERVERs based on Tomact(4 cpu core 8G memory)

  • jdk 8
  • spring cloud:Finchley.SR2
  • spring boot: 2.0.6

6 of API-Servers receive request by Gateway servers, and then handle the request asyncly and return success right now.

@RestController
@RestResponseBody
public class ApiController {
    @Autowired
    private AsyncHandleComponent asyncHandleComponent;
   
    @PostMapping("/api")
    public String api(@RequestBody ReportModel reportModel, @RequestHeader("IGG-PROXY-IP") String ip) {
        asyncHandleComponent.handleApi(reportModel, ip);// async pool
        
        return "success";
    }
}

2 of Gateway Server just receives request by Nginx server,and have a modify requestbody filter,and then routes to API-Servers. Gateway config

spring:
  codec:
    gateway:
      default-filters:
      - StripPrefix=1
      httpclient:
        pool:
          max-idle-time: 10s
          eviction-interval: 20s
          type: fixed
          max-connections: 2048
        connect-timeout: 2000
        response-timeout: 3s
      routes:
      - id: igg-report-st
        uri: lb://igg-report-common
        predicates:
        - Path=/st/**

modify requestbody filter code:

 @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        ServerRequest serverRequest = ServerRequest.create(exchange,
                codecConfigurer.getReaders());
        Mono<String> modifiedBody = serverRequest.bodyToMono(String.class)
                .flatMap(o ->{
                    //log.info("修改前:" + o);
                    return checkParam(o);
                });

        BodyInserter bodyInserter = BodyInserters.fromPublisher(modifiedBody,
                String.class);
        HttpHeaders headers = new HttpHeaders();
        headers.putAll(exchange.getRequest().getHeaders());
        // the new content type will be computed by bodyInserter
        // and then set in the request decorator
        headers.remove(HttpHeaders.CONTENT_LENGTH);
        // if the body is changing content types, set it here, to the bodyInserter
        // will know about it
        CachedBodyOutputMessage outputMessage = new CachedBodyOutputMessage(
                exchange, headers);

        return bodyInserter.insert(outputMessage, new BodyInserterContext())
                .then(Mono.defer(() -> {
                    ServerHttpRequest decorator = decorate(exchange, headers,
                            outputMessage);
                    return chain
                            .filter(exchange.mutate().request(decorator).build());
                })).onErrorResume((throwable) -> {
                    return release(exchange, outputMessage, (Throwable) throwable);
                });
    }

    protected Mono<Void> release(ServerWebExchange exchange, CachedBodyOutputMessage outputMessage, Throwable throwable) {
//        log.info(throwable.getMessage());
        return outputMessage.getBody().onErrorResume(throwable1 -> {
                    return Mono.error(throwable);
                }).map(DataBufferUtils::release)
                .then(Mono.error(throwable));


    }
private Mono<String> checkParam(String bodyStr) {
        try {
            HashMap<String, Object> param = objectMapper.readValue(bodyStr, HashMap.class);
            ParamProperties.KeyMap keyMapProperties = paramProperties.getKeyMap().get(param.get("v"));
            Map<String,String> keyMap;
            if(keyMapProperties == null){
                keyMap = AppInfoSchedular.paramKeyMap.get(param.get("v") + "");
            }else{
                keyMap = keyMapProperties.getKey();
            }
            if(keyMap == null || keyMap.size() == 0){
                return Mono.error(new GatewayException(GatewayExceptionCode.BODY_EMPTY));
            }
            for(String key : keyMap.keySet()){
                bodyStr = bodyStr.replace("\""+key+"\"","\""+keyMap.get(key) + "\"");
            }
            return Mono.just(bodyStr);
        } catch (IOException e) {
            return Mono.error(new GatewayException(GatewayExceptionCode.BODY_EMPTY,"parse"));
        }
    }

My Problem When High concurrency request comes,suddenly increasing from 10K to 50k+ and keeping this status about 10-20 sec. Nginx log so many 502.And I check the Gateway log and api-server Log and CPU.

Gateway log

2024-07-09 05:00:09.828 ERROR {[]} [              parallel-4]    i.r.g.s.e.IggExceptionHandler: 504 GATEWAY_TIMEOUT "Response took longer than timeout: PT3S"; nested exception is org.springframework.cloud.gateway.support.TimeoutException: Response took longer than timeout: PT3S 
org.springframework.web.server.ResponseStatusException: 504 GATEWAY_TIMEOUT "Response took longer than timeout: PT3S"; nested exception is org.springframework.cloud.gateway.support.TimeoutException: Response took longer than timeout: PT3S
	at org.springframework.cloud.gateway.filter.NettyRoutingFilter.lambda$filter$5(NettyRoutingFilter.java:195)
	Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
	*__checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ HTTP POST "/kbrowser/api" [ExceptionHandlingWebHandler]
Original Stack Trace:
		at org.springframework.cloud.gateway.filter.NettyRoutingFilter.lambda$filter$5(NettyRoutingFilter.java:195)
		at reactor.core.publisher.Flux.lambda$onErrorMap$29(Flux.java:6946)
		at reactor.core.publisher.Flux.lambda$onErrorResume$30(Flux.java:6999)
		at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94)
		at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
		at reactor.core.publisher.FluxTimeout$TimeoutOtherSubscriber.onError(FluxTimeout.java:341)
		at reactor.core.publisher.Operators.error(Operators.java:198)
		at reactor.core.publisher.MonoError.subscribe(MonoError.java:53)
		at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
		at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:301)
		at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:280)
		at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:419)
		at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
		at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:271)
		at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:286)
		at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
		at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
		at java.lang.Thread.run(Thread.java:748)
Caused by: org.springframework.cloud.gateway.support.TimeoutException: Response took longer than timeout: PT3S

Gateway cpu image

API-SERVER logs image

API-SERVER CPU image

If I expand the number of gateway server from 2 to 4, these errors disappear. It means When the gateway CPU is under high load, the gateway sends the request body very slowly, causing all threads of the backend api-server to be occupied while reading the request body information. Consequently, the backend api-server is also completely unable to provide external services. I think it should not cause api-server "Unexcepted EOF read socket",It is a bug?

Maplejw avatar Jul 11 '24 08:07 Maplejw