nats.go icon indicating copy to clipboard operation
nats.go copied to clipboard

Ordered consumer subscription can become invalid

Open mprimi opened this issue 1 year ago • 2 comments

Defect

Ordered consumer subscription can become invalid

This bug is unlikely to manifest in normal circumstances. But it happens almost certainly in presence of failures (e.g., bouncing the cluster that hosts the stream).

The sequence of events is:

  • Ordered subscription delivers some messages
  • Ordered subscription resets the internal ephemeral consumer
  • During reset, the nc.Request call (js.go:1956) fails (due to cluster being down)
  • Reset fails without recourse
  • The subscription is now in an unrecoverable state

From this moment on, all calls to NextMsg() will return ErrBadSubscription. The application using this consumer is stuck.


Make sure that these boxes are checked before submitting your issue -- thank you!

  • [x] Included nats.go version
  • [x] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)

Versions of nats.go and the nats-server if one was involved:

nats.go: v1.16.0 nats-server: 2.9.0-beta.20 (probably not relevant)

OS/Container environment:

Not relevant, but reproduces on macOS and Linux

Steps or code to reproduce the issue:

You can reproduce this behavior by running TestJetStreamChaosConsumerOrdered from: https://github.com/nats-io/nats-server/pull/3334

Expected result:

Ordered consumer subscription delivers value in stream sequence order

Actual result:

Ordered consumer subscription can get stuck: any calls to NextMsg() will return ErrBadSubscription.

mprimi avatar Aug 04 '22 21:08 mprimi

@derekcollison Do you think we should retry a few times here (https://github.com/nats-io/nats.go/blob/c157d64783d91fd2084064bd5c07a10dc4bba2d7/js.go#L1956) if we get an error (other than connection closed I guess)? Or is it ok to fail here and the user needs to recreate the ordered consumer subscription when they get "invalid subscription"?

kozlovic avatar Aug 04 '22 23:08 kozlovic

Not sure.. Would need to dig in more context wise..

derekcollison avatar Aug 04 '22 23:08 derekcollison