Connection Pool Exhaustion Leading to 504/500 Errors in Spring Cloud Gateway
Problem Description: We have a setup: Client -> Spring Cloud Gateway -> Backend Service (host: 122). There is only one upstream service, and all requests are routed to host 122.
Starting from 2025-10-16 11:01, all client requests began experiencing 504 Timeouts. The client's timeout is set to 30 seconds, while the gateway and backend service timeouts are configured for 300 seconds.
Then, from 2025-10-16 13:47, all client requests started receiving 500 Errors, indicating the gateway failed to forward the requests. The exception stack trace is as follows:
reactor.netty.internal.shaded.reactor.pool.PoolAcquireTimeoutException: Pool#acquire(Duration) has been pending for more than the configured timeout of 1000ms
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.run(AbstractPool.java:418)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
*__checkpoint ⇢ com.alibaba.csp.sentinel.adapter.spring.webflux.SentinelWebFluxFilter [DefaultWebFilterChain]
*__checkpoint ⇢ org.springframework.web.filter.reactive.ServerHttpObservationFilter [DefaultWebFilterChain]
*__checkpoint ⇢ HTTP POST "/life/xxx" [ExceptionHandlingWebHandler]
Original Stack Trace:
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.run(AbstractPool.java:418)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Our Investigation:
1.Gateway Configuration:
properties
spring.cloud.gateway.httpclient.pool.type=FIXED
spring.cloud.gateway.httpclient.pool.acquire-timeout=1000
spring.cloud.gateway.httpclient.pool.max-connections=10000
spring.cloud.gateway.httpclient.pool.max-idle-time=10000
spring.cloud.gateway.httpclient.response-timeout=PT300S
spring.cloud.gateway.httpclient.connect-timeout=5000
spring.codec.max-in-memory-size=5MB
2.The metric http_client_requests_active_seconds_active_count started rising from 11:01, reached the maximum configured value of 10000 at 13:47, and then the 500 errors (request forwarding failures) began occurring.
3.Analysis of TCP packets shows that the gateway sent its last packet at 11:05. The backend service (122) then sent an RST packet at 11:10 (exactly 300 seconds later). After this RST, the gateway did not send any further TCP packets to host 122, even though clients continued to send requests (which all timed out).
4.We have an internal HTTP endpoint exposed via Spring Web in the gateway for operational tasks (e.g., cache updates by DevOps). Around the time the issue started (approximately 11:02), we observed warnings related to a potential SkyWalking version incompatibility when calling this internal endpoint. However, we haven't conclusively identified this as the cause of a connection leak. The timing correlation is notable.
Relevant Log Snippet (2025-10-16 11:02:30.898):
[reactor-http-epoll-2] [WARN ] ... java.lang.NoSuchMethodError: 'org.springframework.http.HttpStatus org.springframework.http.server.reactive.ServerHttpResponse.getStatusCode()'
at org.apache.skywalking.apm.plugin.spring.mvc.v5.InvokeInterceptor.lambda$afterMethod$0(InvokeInterceptor.java:73)
... (full stack trace provided in the original description)
Request for Assistance: Our investigation is currently stalled. We would appreciate help from the community in identifying the root cause of this issue. Any guidance on potential causes or further troubleshooting steps would be greatly appreciated.
It appears that the method signature referenced in this issue no longer matches the current implementation.
The reported method was:
org.springframework.http.HttpStatus org.springframework.http.server.reactive.ServerHttpResponse.getStatusCode()
However, the current implementation is:
org.springframework.http.HttpStatusCode org.springframework.http.server.reactive.ServerHttpResponse.getStatusCode()
This change reflects the transition in Spring Framework 6 from the specific HttpStatus type to the more general HttpStatusCode abstraction. It may be necessary to verify whether the currently used version of Apache SkyWalking is compatible with this updated API.
By default, ServerHttpResponse.getStatusCode() returns an instance of HttpStatus. However, when the response status code is not defined within HttpStatus, it returns a DefaultHttpStatusCode.
For reference, please see:
- https://github.com/spring-projects/spring-framework/blob/main/spring-web/src/main/java/org/springframework/http/HttpStatusCode.java#L98-L108
- https://github.com/spring-projects/spring-framework/blob/main/spring-web/src/main/java/org/springframework/http/HttpStatus.java
It would also be advisable to confirm whether the error occurred while handling a status code that is not managed by HttpStatus in the current Spring Framework version.
The above analysis pertains to the Relevant Log Snippet (2025-10-16 11:02:30.898), and additional investigation is required for the section where the request was not delivered.
we are facing same issue with apache http client, 504s
I think I've been able to reproduce this but only when circuit breaker is also configured (can be for different routes)
Do either of you have circuit breaker?
I'll prepare a minimal reproducer for that issue but will be useful to know if this is a different issue to help me raise a new issue or join this one
I think I've been able to reproduce this but only when circuit breaker is also configured (can be for different routes)
Do either of you have circuit breaker?
I'll prepare a minimal reproducer for that issue but will be useful to know if this is a different issue to help me raise a new issue or join this one
Thanks for checking. To confirm, we do not have any circuit breakers configured on our routes.
I think I've been able to reproduce this but only when circuit breaker is also configured (can be for different routes)
Do either of you have circuit breaker?
I'll prepare a minimal reproducer for that issue but will be useful to know if this is a different issue to help me raise a new issue or join this one
Yes we are using resilience4j
Okay @sachinbhapkar, given we seem to have a different setup to the original issue from @shuxingzfw I have created a new issue - #3963
If spring team decide they are the same issue then happy for them to be merged