nats.go
nats.go copied to clipboard
Ordered consumer subscription can become invalid
Defect
Ordered consumer subscription can become invalid
This bug is unlikely to manifest in normal circumstances. But it happens almost certainly in presence of failures (e.g., bouncing the cluster that hosts the stream).
The sequence of events is:
- Ordered subscription delivers some messages
- Ordered subscription resets the internal ephemeral consumer
- During reset, the
nc.Request
call (js.go:1956) fails (due to cluster being down) - Reset fails without recourse
- The subscription is now in an unrecoverable state
From this moment on, all calls to NextMsg()
will return ErrBadSubscription
. The application using this consumer is stuck.
Make sure that these boxes are checked before submitting your issue -- thank you!
- [x] Included nats.go version
- [x] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)
Versions of nats.go
and the nats-server
if one was involved:
nats.go: v1.16.0 nats-server: 2.9.0-beta.20 (probably not relevant)
OS/Container environment:
Not relevant, but reproduces on macOS and Linux
Steps or code to reproduce the issue:
You can reproduce this behavior by running TestJetStreamChaosConsumerOrdered
from: https://github.com/nats-io/nats-server/pull/3334
Expected result:
Ordered consumer subscription delivers value in stream sequence order
Actual result:
Ordered consumer subscription can get stuck: any calls to NextMsg()
will return ErrBadSubscription
.
@derekcollison Do you think we should retry a few times here (https://github.com/nats-io/nats.go/blob/c157d64783d91fd2084064bd5c07a10dc4bba2d7/js.go#L1956) if we get an error (other than connection closed I guess)? Or is it ok to fail here and the user needs to recreate the ordered consumer subscription when they get "invalid subscription"?
Not sure.. Would need to dig in more context wise..