nats-server
nats-server copied to clipboard
JetStream KV corruption on Windows with reboot after KV creation [v2.10.24]
Observed behavior
[6836] 2025/01/21 13:08:26.450317 [[0;93mWRN[0m] Filestore [KV_foo] indexCacheBuf corrupt record state: dlen 1166566284 slen 54266 index 0 rl 1166566306 lbuf 508
[6836] 2025/01/21 13:08:55.031990 [[0;93mWRN[0m] Filestore [KV_foo] indexCacheBuf corrupt state: mb.first 10 mb.last 0
[6836] 2025/01/21 13:08:55.031990 [[0;93mWRN[0m] Filestore [KV_foo] indexCacheBuf corrupt state: mb.first 10 mb.last 0
[6836] 2025/01/21 13:08:55.032560 [[0;93mWRN[0m] Filestore [KV_foo] indexCacheBuf corrupt state: mb.first 10 mb.last 0
[6836] 2025/01/21 13:08:55.032560 [[0;93mWRN[0m] Filestore [KV_foo] indexCacheBuf corrupt state: mb.first 10 mb.last 0
[6836] 2025/01/21 13:08:55.033180 [[0;93mWRN[0m] Filestore [KV_foo] indexCacheBuf corrupt state: mb.first 10 mb.last 0
[6836] 2025/01/21 13:08:55.033180 [[0;93mWRN[0m] Filestore [KV_foo] indexCacheBuf corrupt state: mb.first 10 mb.last 0
[6836] 2025/01/21 13:08:55.033751 [[0;93mWRN[0m] Filestore [KV_foo] indexCacheBuf corrupt state: mb.first 10 mb.last 0
[6836] 2025/01/21 13:08:55.033751 [[0;93mWRN[0m] Filestore [KV_foo] indexCacheBuf corrupt state: mb.first 10 mb.last 0
[6836] 2025/01/21 13:08:55.039885 [[31mERR[0m] JetStream failed to store a msg on stream '$G > KV_foo': corrupt state file
Corruption is corrected on next startup but with data loss. Restore loses values.
Expected behavior
No corruption
Server and client version
nats-server: v2.10.24
nats-cli: 0.1.6
Host environment
Windows Server 2019 Standard Windows Server 2022 Standard
Steps to reproduce
nats-server config:
host: 0.0.0.0
port: 4222
debug: true
jetstream {
store_dir: "C:\\nats_test"
cipher: "chachapoly",
key: "ScxCcMDcemw8COVUtPVsdfLMRLG1PGpj"
}
- nats-server configured as a windows service or running as an app
- jetstream folder not created (first run)
- start nats-server
nats kv add foo- reboot (if service) or <ctrl>c nats-server and reboot (if app)
> nats kv put foo test 1
1
> nats kv put foo test 2
2
> nats kv put foo test 3
3
> nats kv put foo test 4
4
> nats kv put foo test 5
5
> nats kv put foo test 6
6
> nats kv put foo test 7
7
> nats kv put foo test 8
8
> nats kv put foo test 9
9
> nats kv put foo test 10
10
> nats kv del foo test
? Delete key foo > test? Yes
> nats kv put foo test 10
nats: error: nats: corrupt state file
>
NOTE: if you write to KV before the reboot, corruption doesn't happen. This is our work-around. Repeatable.
This is an issue for jetstream work-queues too.
There is a setting sync: always that could potentially help here. This issue also looks similar https://github.com/nats-io/nats-server/issues/5412
Is there a function to force a JetStream filesystem flush?
There is a setting
sync: alwaysthat could potentially help here. This issue also looks similar #5412
sync: always had no effect. Same corruption.
Looking at the configuration documentation, I think you want sync_interval: always not sync: always.
I would expect sync: always to do nothing/have no effect since it is not a valid configuration key (or at least not documented).
Unfortunately the server has a lot of aliases and multiple names for the same config item not always all shown in docs or examples, which in this case means both sync and sync_interval are valid.