spring-cloud-gateway icon indicating copy to clipboard operation
spring-cloud-gateway copied to clipboard

Unknown reason triggers .doOnCancel(() -> cleanup(exchange)) in NettyWriteResponseFilter occasionally

Open renjie6666 opened this issue 1 year ago • 6 comments

Describe the bug Spring Cloud Gateway version:2.2.5 reactor-netty version :0.9.15

  • Use method ServerHttpResponse.writeWith(Mono<DataBuffer> instead of ServerHttpResponse.writeWith(Flux<DataBuffer>) in NettyWriteResponseFilter.java in high-concurrency scenarios triggers doOnCancel(() -> cleanup(connection)) occasionally
  • Method doOnCancel(() -> cleanup(connection)) will close the long connection between the gateway and downstream services, and this closure will not be monitored by the connection pool status, resulting in subsequent requests generating exception reactor.netty.channel.AbortedException: Connection has been closed BEFORE response while sending request body

Describe the solution you'd like

  • Understanding the root cause of mono triggering doOnCancel
  • Connection pool can monitor connection status when the connection is closed

Sample

public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
    return chain.filter(exchange)
        
        .doOnError(throwable -> cleanup(exchange))
        .then(Mono.defer(() -> {
            
            Connection connection = exchange.getAttribute(CLIENT_RESPONSE_CONN_ATTR);
            if (connection == null) {
                return Mono.empty();
            }
            ServerHttpResponse response = exchange.getResponse();
            
            final Flux<DataBuffer> body = connection
                .inbound()
                .receive()
                .retain()
                .map(byteBuf -> wrap(byteBuf, response));

            // My changes are here
            Mono<DataBuffer> newBody = body.single();
            MediaType contentType = null;
            try {
                contentType = response.getHeaders().getContentType();
            }
            catch (Exception e) {}
           
            return (isStreamingMediaType(contentType)
                ? response.writeAndFlushWith(body.map(Flux::just))
                : response.writeWith(newBody));
        })).doOnCancel(() -> cleanup(exchange));
}

renjie6666 avatar Mar 15 '24 07:03 renjie6666

Is this still an issue with the supported version of spring cloud 4.1.1?

spencergibb avatar Mar 15 '24 12:03 spencergibb

Is this still an issue with the supported version of spring cloud 4.1.1?

yes, the issue still exists. Please mainly focus on the impact of my changes to the source code NettyWriteResponseFilter. I added comments to the sample code.

renjie6666 avatar Mar 16 '24 07:03 renjie6666

Can you tell me how to reproduce it?

spencergibb avatar Mar 16 '24 11:03 spencergibb

Test topology:

  1. Jmeter (Constructing HTTP post requests,100 qps is enough)->
  2. Spring Cloud Gateway( version 4.1.1 can reproduce and make the modifications according to my previous comments in NettyWriteResponseFilter )->
  3. DownStream Services(a netty http server Or a springboot server)

Notes:

  • Jmeter runs on Windows10 ,The number of threads in the testing group should be greater than 1
  • Every other service runs on a 2-core 4GB Ubuntu virtual machine
  • If the number of errors occurring is positively correlated with QPS, then it represents a reproduction
  • the errors is Connection has been closed BEFORE response, while sending request body ,Or You can supplement the logs in doOnCancel(() -> cleanup(exchange)) ,which also represents a reproduction

renjie6666 avatar Mar 18 '24 01:03 renjie6666

The issue I asked is related to this one? https://github.com/reactor/reactor-netty/issues/741

renjie6666 avatar Mar 18 '24 01:03 renjie6666

Does the question subject have a solution? I have also been troubled by this issue in my project for a long time..

haoran1221 avatar Apr 07 '24 03:04 haoran1221