flink-remote-shuffle
flink-remote-shuffle copied to clipboard
Introduce rest APIs for dynamic ShuffleWorker online/offline
Motivation
Based on the rest API, we can remove or add ShuffleWorker dynamically without restarting the cluster. For example, we can remove a bad ShuffleWorker or we can add the removed ShuffleWorker back. (Note that start of new ShuffleWorker is already supported.)
Changes
New rest APIs need to be added together with the corresponding handler. In the handler, the ShuffleWorker can be remove from the list which means that new data will not be written to that ShuffleWorker. Furthermore, we may choose to remove the produced data at the same time. We can also offer a new API to kill a selected ShuffleWorker (Note that a new ShuffleWorker maybe started by the external system like K8s).
Test
- Unit test.
- Test manually on a cluster.