nats-server
nats-server copied to clipboard
Jetstream KV Cluster loosing data after nodes restart/
Observed behavior
We start up a 5-node cluster with the following configuration and start to continuously put data into the KV:
port: 4222
http_port: 8222
cluster {
name: js_kv
listen: 0.0.0.0:6222
connect_retries: -1
pool_size: 9
authorization {
user: user
password: password
timeout: 0.5
}
routes = [
nats-route://user:password@js_kv_node01:6222,
nats-route://user:password@js_kv_node02:6222,
nats-route://user:password@js_kv_node03:6222,
nats-route://user:password@js_kv_node04:6222,
nats-route://user:password@js_kv_node05:6222
]
compression: {
mode: s2_auto
rtt_thresholds: [10ms, 50ms, 100ms]
}
}
jetstream {
store_dir: /data/jetstream
max_file_store: 10737418240
}
To test sustainability we turn off two random nodes and wait for 5-10 minutes before turning them back on again.
As a result after restart we see a difference in last sequence in the nodes that have been turned off. And in time it starts growing.
We also tried launching a durable consumer on the test cluster. As a result if the consumer leader switches from a healthy node to a node with sequence loss it stops delivering data due to unexpected sequence difference.
Following testing proves that in an unstable server environment kv can not guarantee the safety of data and getting data from it can give an unreliable result depending on which node is the cluster leader at the given moment.
Expected behavior
We expect that after turning nodes back on again, we would get the same last sequence on all 5 nodes and consumers would continue delivering data or at least alert that they have stopped doing so.
Server and client version
tested on both v2.10.14 and nightly-20240513 nats.go v1.34.1
Host environment
No response
Steps to reproduce
No response