nats-server icon indicating copy to clipboard operation
nats-server copied to clipboard

NATS Service Failure During the Product Upgrade Process

Open Inkathu opened this issue 11 months ago • 4 comments

Observed behavior

During an attempt to install the new version of the product, the NATS service experienced failures, leading to unsuccessful upgrade attempts. The issue occurred when attempting to stop a lot of running product jobs.

Symptoms:

  • NATS service failed to respond to multiple StopJobRequest messages.
  • Performance warnings indicated delayed internal subscriptions on various streams (e.g., $JS.API.STREAM.PURGE.SessionEventsStream).
  • The product service failed to start due to no API response from NATS.
  • The NATS service was found to be down during the second upgrade attempt(with "incorrect function error"), and manual restart attempts were initially unsuccessful. Performance logs example:

2025/05/07 14:13:59.418304 [WRN] Internal subscription on "$JS.API.STREAM.PURGE.SessionEventsStream" took too long: 3m13.0468125s 2025/05/07 14:13:59.739927 [WRN] Internal subscription on "$JS.API.STREAM.PURGE.SessionEventsStream" took too long: 3m13.3086614s 2025/05/07 14:13:59.739927 [WRN] Internal subscription on "$JS.API.STREAM.PURGE.SessionEventsStream" took too long: 3m13.3689963s

Resolution Attempts: Cleaning up the NATS folder and restarting the service resolved the issue, allowing for a successful upgrade to new version of the product.

Screen of the service failure between unsuccefull restarts:

Image

Expected behavior

The Nats server works without problems, and there is no need to clean streams.

Server and client version

Server Version: 2.10.22 Client: .net client nats.net v2.2.3

Host environment

No response

Steps to reproduce

No response

Inkathu avatar May 14 '25 10:05 Inkathu

Difficult to say much without more complete logs and more information about what your application does on a restart, but what exactly was restarted here? Were the NATS Servers included in the restart jobs?

Also you are a bit out-of-date on 2.10.22, which is 8 months old at this point, recommend you try and stay up-to-date with NATS Server releases as we regularly fix bugs and improve performance.

neilalexander avatar May 14 '25 12:05 neilalexander

Difficult to say much without more complete logs and more information about what your application does on a restart, but what exactly was restarted here? Were the NATS Servers included in the restart jobs?

Also you are a bit out-of-date on 2.10.22, which is 8 months old at this point, recommend you try and stay up-to-date with NATS Server releases as we regularly fix bugs and improve performance.

It is difficult to provide more details. I have attached the log file of NATS. About the update - when updating the product, its services are restarted. In short, when actively working and stopping the internal tasks of the product, there is a high load on NATS. As for the NATS version update - our product is large and we can update NATS only after testing the product on the new version of NATS(in processing)

nats-server_111124.log

Inkathu avatar May 20 '25 10:05 Inkathu

Ping

Inkathu avatar Jun 13 '25 12:06 Inkathu

How come you are purging the stream so often? There are lots of cases in your log where you are purging it several times a second.

neilalexander avatar Jun 13 '25 12:06 neilalexander