watermill
watermill copied to clipboard
[watermill-nats] Calling Unsubscribe for nats.Subscriber can lead to unexpected behaviour
Our use-case is using Durable subscriptions with Queues and Kubernetes pods running watermill using watermill-nats with jetstream support.
When we started playing with scale down functionality we saw weird behaviour where the consumers for containers were not processing messages.
Digging deeper found a comment in the nats.go repository source code saying:
...
// * If Durable() option is specified, the library will attempt to lookup a JetStream
// consumer with this name, and if found, will bind to it and not attempt to
// delete it. However, if not found, the library will send a request to create
// such durable JetStream consumer. The library will delete the JetStream consumer
// after an Unsubscribe() or Drain().
...
In the case of waterrmill nats, I see that there is not explicit creation of the consumer, so we are in the case where the consumer will be deleted when Unsubscribe()
or Drain()
on the nats.Subscriber
is called
Which is actually deleting the consumer silently and other clients continue to run, but no messages are forwarded to them.
Similar issue happen when we were attempting a rolling update in the sense of Kubernetes.
The auto-provisioning behavior is something I've been meaning to try and get right. I think the best approach is to expose a callback that gives user maximum flexibility in working with the underlying nats client types. But there is a new dedicated jetstream API available in the nats go client that really simplifies a lot of this - might make more sense to focus efforts on that (this is what I have been planning to do in coming weeks).
I think the best approach right now is probably to provision your stream+durable externally (even in your application code before handoff to watermill - most operation are idempotent on NATS end if the config used is consistent) - existing implementation should connect to an existing durable correctly.