stan.java icon indicating copy to clipboard operation
stan.java copied to clipboard

NATS Streaming connection gets degraded over time

Open wutkemtt opened this issue 3 years ago • 2 comments

We have currently problems with one of our services connected to our cloud NATS-Cluster via Leafnode. We are using NATS-Streaming with Springboot MicroServices. The service runs very well for a long time, but then the connection to the NATS-Cluster gets lost sometimes, resulting in a "Timedout on heartbeats" message in the nats streaming server log. We checked everything, configs, nats servers, AKS cluster, network connections and latencies and finally our service implementation multiple times, but we can't find the problem. No change is being made to any server or cluster component or to the service itself. At this point we suppose a kind of performance leak inside the stan or nats lib, but we don't know it for sure.

Our services uses this line of code to create a nats connection: new SubscriptionOptions.Builder().durableName(name).startWithLastReceived().maxInFlight(1).subscriptionTimeout(Duration.ofSeconds(10)).ackWait(Duration.ofSeconds(acknowledgeTimeout)) The acknowledgeTimeout is set to 30s.

On the server side we are using the following heartbeat config: #Heartbeat hb_fail_count: 20 hb_interval: "30s" hb_timeout: "10s"

The microservice runs on a Windows Server 2016 and connects via TLS to a NATS leafnode server within the same network. The leafnode server itself is connected to our NATS Cluster running on Azure AKS.

wutkemtt avatar Sep 03 '20 10:09 wutkemtt