ml-app-deployer
ml-app-deployer copied to clipboard
Add tasks for retiring and detaching forests
Currently there is no way to delete a forest using ml-gradle.
What interface do you have in mind here - i.e. what properties would you want to specify on the command line? Just -PforestName, or something else? How about deleting multiple forests at once, or deleting all forests for a database on a specific host or in a specific group?
Also, what's the use case? "As an ml-gradle user, I want to delete a forest, so that...."
I mainly need this task in association with removing a host mlRemoveHost
from a cluster. As you know a host can only leave a cluster if there's no forest assigned to it, which is why I wanted to remove all the forests of a given host.
So back to you question, I would be nice to be able to:
- Remove a single forest
- Remove all the forests of a given database
- Remove all the forests of a given host
Thank you.
Btw... Speaking of mlRemoveHost
, Is there a possibility to extend the functionality of that task so it can handle removing of the assigned forests, before leaving the cluster instead of just failing?
If the use case is for removing a host, then you'd normally follow a procedure of (let's assume 3 hosts with 2 primary forests per host plus replicas, and data in every primary forest, and host 3 is the one you want to remove):
- Retire the forests on host 3
- After the rebalancer copies all the data to primary forests on hosts 1 and 2, detach the forests on host 3
After doing the above, you can safely remove host 3.
You've used the verbs "remove" and "delete" - are you really looking to automate the retire/detach process? Because at least for the process of removing a host, there's not a use case for deleting the forests - unless there's no data in them that needs to be rebalanced (you'd still need to detach them though).
Note that if you are looking to automate retire/detach, they'd be two separate Gradle tasks, as the rebalancing process could take hours to finish depending on the amount of data.
What you described is exactly my use case... retiring and detaching a forest to be able for a host to leave a cluster.
I tested that manually with a forest with around 10gb and the rebalancing didn't take more than a few mins and since MarkLogic recommends max of 200gb per forest, I was guessing that it should still be manageable to automate this process, considering the rebalancing process.
If you think in reality, it doesn't make sense to automate the whole process then disregard my request and I would fall back to the multiple gradle task approach to achieve this.
I think there's value in tasks like this:
./gradlew mlRetireForests -PdatabaseName=myDatabase -PhostNames=host3
./gradlew mlDetachForests -PdatabaseName=myDatabase -PhostNames=3
The catch is - how does ml-gradle, or any client, know when the rebalancer is finished? That isn't an ml-gradle problem, it's a problem for any client of the Manage API. If there's a program that can query the Manage API to know when the rebalancer is finished, that can be made an optional part of mlRetireForests or a new task itself - e.g. mlWaitForRebalancer -PdatabaseName=myDatabase.
You could then do everything like this:
./gradlew mlRetireForests mlWaitForRebalancer mlDetachForests -PdatabaseName=myDatabase -PhostNames=host3
And we could then have an "aggregate" task that does everything:
./gradlew mlRemoveForests -PdatabaseName=myDatabase -PhostNames=host3
Want to take a crack first at writing the program to know when the rebalancer is done?
I like your idea on how to wait for the rebalancer to be done and the chaining of tasks and it could also be used for mlRemoveHost
task!
Moved this to ml-app-deployer as most of the work will need to occur here first.