node Failure Scenarios

Failure Scenarios

Open lautarodragan opened this issue 7 years ago • 1 comments

List and make tickets for the different failure scenarios.

Examples:

Application process is force-killed, the application doesn't receive an kill even by which to finish what it's doing.
RabbitMQ goes down
MongoDB goes down
IPFS goes down
Po.et Node comes back up, but the overall state is unknown, since messages could have been lost. Review discrepancies in the information and run a self-healing process?
Po.et Node should stay up and running if a service it depends on goes down - how?
Bitcoin Blockchain unavailable — reading from and writing to it impossible. Should report on it but stay alive, and keep retrying until it can sync.

Nov 24 '17 15:11 lautarodragan

if everything is in a queue, i recommend just force crashing for any error. when the process (or worker using cluster restarts), it can be reprocessed. joyent talks a lot about crash-oriented design on their site.

servicebus handles this abstraction with rabbitmq and also setting up error queues, as well as an integration with redis for caching the messages to be retried.

Dec 31 '17 10:12 patrickleet

node node copied to clipboard

Failure Scenarios

node
node copied to clipboard