nats.java
nats.java copied to clipboard
NATS timeout errors when calling NatsJetStreamManagement
Observed behavior
As described in Slack channel https://natsio.slack.com/archives/CM3T6T7JQ/p1692790156946559 we´re sometimes seeing timeout errors when we try to fetch information from Jetstream.
We see that the request goes into the NATS server but the response is never received. Not 100% sure if this is a server or client issue. Please move to another repo if this one doesn´t match.
We even see these timeouts if we increase the timeout threshold to 10 or even 30 seconds. As this is reproducable in small tests without a high load on the system, the system resources shouldn´t be the problem here.
Expected behavior
All requests against the NatsJetStreamManagement should be answered.
Server and client version
Experienced same behavior on different server (2.8.x / 2.9.x and 2.10.0) and client (jnats 2.14.x, 2.16.x) versions.
Host environment
Windows; NATS server running in Docker (Docker Desktop for Windows) GKE cluster
Steps to reproduce
The attached ZIP contains a simple reproducer:
container-starter This module creates a Testcontainer starting the NATS server.
reproducer A stripped down version of our code which creates a stream and registers subjects to it and also adds message consumers to them.
How to build
Simply do a mvn clean install -DskipTests in the parent folder.
After that, go into reproducer and execute the TimeoutReproducerTest.bat which starts the supplied test 25 times. A couple of executions will success, others will fail with the mentioned timeout error.
The log files of each test-run are placed under the ./logs folder.
@joachimglink Please see dm in slack.
I cannot run your example, but... my guess here is that this is an issue where you are making a management call in the handler for another message on the same connection. We've seen this before. The dispatcher makes a blocking call to post a message to a handler. The handler then makes an api call like a JetStreamManagement call that is a request under the covers. The request is made and the message comes in from the server, but the dispatcher is already busy delivering a message, so the 2nd request runs out of time.
The fix for this is the following: In the connection options add in
.useDispatcherWithExecutor()
This tells the dispatcher to run the delivery as a task from the Options executor service, which you can also supply with the option
.executor(ExecutorService)
Please let me know if this solves your issue. Otherwise, I will try again with your project, but I'm going to need to to run against a local server (not docker) with a more simplified project.