nats.java icon indicating copy to clipboard operation
nats.java copied to clipboard

NATS timeout errors when calling NatsJetStreamManagement

Open joachimglink opened this issue 2 years ago • 1 comments

Observed behavior

As described in Slack channel https://natsio.slack.com/archives/CM3T6T7JQ/p1692790156946559 we´re sometimes seeing timeout errors when we try to fetch information from Jetstream.

We see that the request goes into the NATS server but the response is never received. Not 100% sure if this is a server or client issue. Please move to another repo if this one doesn´t match.

We even see these timeouts if we increase the timeout threshold to 10 or even 30 seconds. As this is reproducable in small tests without a high load on the system, the system resources shouldn´t be the problem here.

Expected behavior

All requests against the NatsJetStreamManagement should be answered.

Server and client version

Experienced same behavior on different server (2.8.x / 2.9.x and 2.10.0) and client (jnats 2.14.x, 2.16.x) versions.

Host environment

Windows; NATS server running in Docker (Docker Desktop for Windows) GKE cluster

Steps to reproduce

nats-timeout-reproducer.zip

The attached ZIP contains a simple reproducer:

container-starter This module creates a Testcontainer starting the NATS server.

reproducer A stripped down version of our code which creates a stream and registers subjects to it and also adds message consumers to them.

How to build Simply do a mvn clean install -DskipTests in the parent folder. After that, go into reproducer and execute the TimeoutReproducerTest.bat which starts the supplied test 25 times. A couple of executions will success, others will fail with the mentioned timeout error. The log files of each test-run are placed under the ./logs folder.

joachimglink avatar Sep 21 '23 09:09 joachimglink

@joachimglink Please see dm in slack.

scottf avatar Oct 19 '23 23:10 scottf

I cannot run your example, but... my guess here is that this is an issue where you are making a management call in the handler for another message on the same connection. We've seen this before. The dispatcher makes a blocking call to post a message to a handler. The handler then makes an api call like a JetStreamManagement call that is a request under the covers. The request is made and the message comes in from the server, but the dispatcher is already busy delivering a message, so the 2nd request runs out of time.

The fix for this is the following: In the connection options add in

.useDispatcherWithExecutor()

This tells the dispatcher to run the delivery as a task from the Options executor service, which you can also supply with the option

.executor(ExecutorService)

Please let me know if this solves your issue. Otherwise, I will try again with your project, but I'm going to need to to run against a local server (not docker) with a more simplified project.

scottf avatar Jul 08 '24 19:07 scottf