flink-remote-shuffle issues

SortBuffer supports reading data from the specified channel index

### Motivation SortBuffer can improve read performance significantly, but it doesn't support read data from a specific channel. The development of some new functions, for example, ReducePartition implementation, depends on...

TanYuxin-tyx

Implement ReducePartition

1

### Motivation As described in the document, ReducePartition is a good supplement of the current MapPartition. It has several good features, for example, it can benefit streaming and hybrid shuffle...

wsry

Support standby ShuffleManager

### Motivation Currently, the high availability of ShuffleManager depends on the support of external services when it hangs up. In essence, ShuffleManager has a single point problem. We can introduce...

TanYuxin-tyx

Handle network issues gracefully

### Motivation Currently, network issues like unstable network may cause task failover which may further lead to reproducing of data. In fact, we can improve the behavior by reconnecting and...

wsry

Support data replication

### Motivation For some jobs, data loss and reproduction is not acceptable, data replication is needed to handle data loss in this scenario. ### Changes Allow to config the replication...

wsry

Support dynamic log level

2

### Motivation Support to change log level dynamically can help to debug the shuffle system. ### Changes Add a rest API to both ShuffleManager and ShuffleWorker together with the corresponding...

wsry

Introduce rest APIs for dynamic disk online/offline

### Motivation Based on the rest API, we can remove or add disk dynamically without restarting the cluster. For example, we can remove a bad disk or we can add...

wsry

support pod templates in k8s deployment mode

1

We should support pod templates in k8s deployment mode, which will bring great convenience.

jlon

The shuffle manager should restore the previously managed workers when re-election

6

The shuffle manager should restore the previously managed workers when re-electing the master. Otherwise, in the next heartbeat cycle, the job will not be available when the worker is requested,...

jlon

add PrometheusMetricReporterFactory

1

I want to contribute this feature

jlon

flink-remote-shuffle
flink-remote-shuffle copied to clipboard

Metadata

SortBuffer supports reading data from the specified channel index

Implement ReducePartition

Support standby ShuffleManager

Handle network issues gracefully

Support data replication

Support dynamic log level

Introduce rest APIs for dynamic disk online/offline

support pod templates in k8s deployment mode

The shuffle manager should restore the previously managed workers when re-election

add PrometheusMetricReporterFactory

← Metadata

Owner

Metadata

flink-remote-shuffle flink-remote-shuffle copied to clipboard

Metadata

← Metadata

Owner

Metadata

flink-remote-shuffle
flink-remote-shuffle copied to clipboard