siddhi-operator
siddhi-operator copied to clipboard
Events are potential to loss when NATS becomes unavailable in the Distributed mode
Description: The current deployments of NATS that we are automatically creating through Siddhi operator is a basic NATS deployment with a NATS cluster and a Streaming cluster. In distributed deployment of Siddhi, we have used NATS as our primary messaging system that enables communication among each Siddhi app. If NATS becomes unavailable for a while there can be scenarios where some user events can be lost.
According to the above arguments, it is much better if we can provide HA deployment of NATS by the Siddhi operator in the automatic NATS deployment phase.
Suggested Labels: Feature improvement
Affected Product Version: 0.2.0-beta
Steps to reproduce:
- Deploy stateful Siddhi app in default distributed manner using Siddhi operator
- Send a sequence of events to NATS using a NATS client
- Manually down the NATS streaming cluster created in your K8s cluster.
- You will see some of the events getting loss.
Observations of the 2 node NATS/STAN cluster deployment.
- After killing one NATS/STAN pod events are not passed through NATS/STAN cluster anymore. In other words, events are missing.
- Note that the reconcile NATS/STAN pod does not have the subscribed subject.
We will be able to resolve this issue after the nats-streaming-operator developed the fault tolerance mode as stated in this issue.