nats.rs
nats.rs copied to clipboard
KV Watch iterator hangs on stale bucket connections.
Make sure that these boxes are checked before submitting your issue -- thank you!
- [x] Included below version and environment information
- [x] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)
NATS version (grep 'name = "nats"' Cargo.lock -A 1
)
0.23.1
rustc version (rustc --version
- we support Rust 1.41 and up)
1.66.0-nightly
OS/Container environment:
Mac OS
Steps or code to reproduce the issue:
One one process/thread subscribes to watch events and another process deletes the bucket the original watch subscriber hangs on calls to next. In my example you'll see "received entry" printed by the watch thread until the bucket is deleted. Then it is no longer printed indicating it hangs on the call to next. I even create the bucket after deleting to check if any events will come thru but still no luck.
We're experiencing this same issue in our prod cluster where we run NATS 2.7.4 and this client: https://github.com/segfaultdoc/nats.rs/tree/seg-v0.18.2
- Run this docker container: https://github.com/segfaultdoc/nats_blocking/blob/seg/kv-bucket-stale-conns/docker-compose.yaml in one terminal
- Run this binary: https://github.com/segfaultdoc/nats_blocking/blob/seg/kv-bucket-stale-conns/src/main.rs with
RUST_LOG=info cargo run -- --bucket-config-path bucket.yaml --nats-url localhost:4222
in another terminal
Expected result:
next
should return None since the bucket was deleted and connection is stale
Actual result:
next
hangs indefinitely
After debugging a bit I'm seeing the subscription does not get removed from the clients internal State::ReadState::Subscriptions
map. However the server stops sending messages for the subscription id.
NOTE: Same issue occurs in the context of a super cluster. If Cluster A loses connection to B for example loadbalancer in B brought down, then all calls to watch in A hang