nats-server
nats-server copied to clipboard
Memory leak of nats cluster
Observed behavior
We have a NATS cluster of three nodes (NATS version is 2.10.16).
host: 127.0.0.1
port: 4222
server_name: nats-02-cluster
accounts {
$SYS { users = [ { user: "nats", pass: "PASS" } ] }
}
jetstream {
store_dir=/var/lib/nats
max_memory_store: 1024Mb
max_file_store: 819200Mb
}
cluster {
name: cluster
listen: 127.0.0.1:6222
routes: [
nats-route://nats-00-cluster:6226
nats-route://nats-01-cluster:6226
nats-route://nats-02-cluster:6226
]
compression: {
mode: s2_auto
rtt_thresholds: [10ms, 50ms, 100ms]
}
}
http_port: 8222
max_connections: 64K
max_control_line: 4KB
max_payload: 8MB
max_pending: 64MB
max_subscriptions: 0
log_file: /var/log/nats/na29ts-server.log
Cluster interaction occurs via nginx:
upstream nats {
server 127.0.0.1:4222;
}
server {
listen 127.0.0.1:4224 so_keepalive=1m:5s:2;
listen 192.168.1.2:4224 so_keepalive=1m:5s:2;
access_log off;
tcp_nodelay on;
preread_buffer_size 64k;
proxy_pass nats;
}
upstream nats-cluster {
server 127.0.0.1:6222;
}
server {
listen 127.0.0.1:6226 so_keepalive=1m:5s:2;
listen 192.168.1.2:6226 so_keepalive=1m:5s:2;
access_log off;
tcp_nodelay on;
preread_buffer_size 64k;
proxy_pass nats-cluster;
}
Events are forwarded to NATS with Vector service. The average throughput is 80k events per second (or 90 MB/s).
nats:
type: "nats"
inputs:
- "upstreams.other"
url: "nats://127.0.0.1:4222"
request:
rate_limit_num: 70000
buffer:
type: memory
max_events: 2000
subject: "{{ type }}"
acknowledgements:
enabled: true
encoding:
codec: json
Memory usage is continuously increasing and reaches host limit (60 GB) and OOM killer happens to NATS service as a result. NATS profile can be found in attachments. profiles.tar.gz
Expected behavior
Service memory should not leak
Server and client version
nats-server: 2.10.16 nats: 0.1.4
Host environment
No response
Steps to reproduce
No response
Thanks for providing the memory profiles!
Can you please try disabling route compression by changing mode from s2_auto to off and see if there's an improvement?
we did it. no changes followed
I removed nginx and now the nodes communicate with each other directly. Memory continues to leak.
profiles.zip it's current profiles
Your latest profile suggests there are still a lot of allocations in the route S2 writer, are you sure route compression was disabled properly? You may need to do a rolling restart of the cluster nodes to ensure it's taken effect.
You are right: I forgot to turn off compression on one server
OK, this latest profile shows a different type of memory build-up to before (this one shows Raft append entries, last time that wasn't evident).
Can you please post more details about your cluster? What spec of machines are the cluster nodes running on? Are all of the cluster nodes the same CPU/RAM/disk-wise? Do you see these build-ups on a single node or multiple?
The Nats cluster is running on servers with the following specifications: 64GB RAM, Intel(R) Xeon(R) E-2236 CPU @ 3.40GHz, 890GB SSD. All servers are identical. We use 10GE network cards. The operating system is Arch Linux. Memory usage on the servers is uneven. The node with the most primary replicas consumes memory to a greater extent.
Do you async publish for JetStream?
Honestly, I'm not sure how this is implemented in vector.dev. Here is a link to the module: https://vector.dev/docs/reference/configuration/sinks/nats/ https://github.com/vectordotdev/vector/tree/master/src/sinks/nats
I reviewed the source code of the NATS module and saw the function call to async_nats.
Maybe we can have @Jarema take a look since its using the rust client.
@derekcollison A quick glance shows that vector is using Core NATS publish, so not even JetStream async publish.
ok very easy to overload the system in that case.. This will balloon up the internal append entries since that pipeline needs to interbally queue then write to the store.
Will you fix this? Or do we need to make changes on our end?
The issue needs to be rectified in Vector by switching from Core NATS publishes to JetStream publishes, as currently the Core NATS publishes can potentially send data into JetStream faster than it can be processed. This explains the build-up of append entries in memory that you are seeing.
It looks like there's already an issue tracking this on their repository: https://github.com/vectordotdev/vector/issues/10534
We will die faster than the task above will be completed :) (It was created in 2021)
Is there any chance that you could create some kind of var to limit the rate of JetStream forwarding?
We would not approach it that way, we should not slow down normal NATS core publishers due to misconfiguration.
We are considering a way to protect the server by dropping AppendEntry msgs from the NRG (raft) layer. That would avoid memory bloat but would cause the system to thrash a bit catching up the NRG followers when they detect gaps from the dropped messages.
@Steel551454 I plan to contribute to the issue mentioned somewhere in Q3, and introducel JetStream support. The current one is not actually supporting acks, despite saying that in docs.
Let's say we turn off Jetstream. Where in NATS settings can we specify where to store events?
If you turn off JetStream, messages will not be stored anywhere, they become at most once. You need a subscriber application that processes them.
In JetStream, you can define the store directiory here:
jetstream {
store_dir: /path
}
or via providing -sd flag.
Do you happen to have a simple Go-written proxy: transforming requests from NATS Core Stream to Jetstream?
Today we replaced the pipeline parser vetor.dev with redpanda-connect, which has a plugin for working with NATS Jetstream. The memory leak issue has not been resolved. Attached is an archive with profiles. profiles.zip
@derekcollison, I'm sorry to bother you, but switching our pipeline to use Jetstream did not solve the memory leak issue. Maybe we should add some explicit limiter? The situation where a cluster node crashes due to OOM cannot be considered good.
Agree, we could simply drop messages, and not place them into the stream. The system will be complaining about high lag getting messages into the stream. Those should be in the log.
However in this case, I would imagine you want the system to store the messages. So you either need to slow down the publisher or speed up the storage mechanism. Meaning running multiple parallet streams and have the NATS system transparently partition the subject space into multiple streams.
Do I understand correctly that in our case, for faster message storage in streams, we should launch multiple instances (preferably on different servers) and distribute the streams among several instances?
Or do you have something else in mind?
And another question: would the memory leak situation change if we used NVMe disks instead of traditional SSDs to store the events?
Yes that is correct. @jnmoyne can help with how that gets put together.
The memory leak is not a leak, since the publishing layer does not wait and publishes as fast as NATS core allows (NATS core can be >10M/s vs JetStream around ~250k/s), the system is simply holding onto all the staged messages waiting to be stored into the stream.
NVMe probably would not make a difference in this case IMO.