flink-remote-shuffle icon indicating copy to clipboard operation
flink-remote-shuffle copied to clipboard

Introduce rest APIs for dynamic ShuffleWorker online/offline

Open wsry opened this issue 3 years ago • 0 comments

Motivation

Based on the rest API, we can remove or add ShuffleWorker dynamically without restarting the cluster. For example, we can remove a bad ShuffleWorker or we can add the removed ShuffleWorker back. (Note that start of new ShuffleWorker is already supported.)

Changes

New rest APIs need to be added together with the corresponding handler. In the handler, the ShuffleWorker can be remove from the list which means that new data will not be written to that ShuffleWorker. Furthermore, we may choose to remove the produced data at the same time. We can also offer a new API to kill a selected ShuffleWorker (Note that a new ShuffleWorker maybe started by the external system like K8s).

Test

  • Unit test.
  • Test manually on a cluster.

wsry avatar Dec 06 '21 08:12 wsry