storm
storm copied to clipboard
Persisting topologies upon MesosNimbus restart when using Marathon
We've got Mesos/Storm deployed via Marathon and it's been working quite nicely, but in the case where the scheduler fails, or is killed, Storm will be brought back up fresh, without any of its topologies.
How do others handle this? We ended up writing Marathon app definitions for each topology that check with Storm whether a given topology is running, and submit it if it's not, but it feels clunky, and I'm wondering if there are simpler ways people are using to bootstrap Mesos/Storm with topologies on launch?
Currently I deploy the Storm/Mesos Nimbus via a dedicated node. Some use external volumes and reservations with marathon, the reason is that the Nimbus stores some of its state locally (including the frameworkID, which should really be fixed). With HA Nimbus in 1.0.0 there might be better options to explore as we start looking at upgrading the framework.
This relates to #173 and #174. We need to document the recommended approach for using storm-mesos with Marathon. That will take a bit of time as @JessicaLHartog and I look into this.