paimon [Bug] After the flink job written to Paimon modifies `sink.parallelism`, the job will not be able to recover from the checkpoint

[Bug] After the flink job written to Paimon modifies `sink.parallelism`, the job will not be able to recover from the checkpoint

Open huyuanfeng2018 opened this issue 10 months ago • 3 comments

Search before asking

[X] I searched in the issues and found nothing similar.

Paimon version

master

Compute Engine

flink

Minimal reproduce step

First set the parallelism degree of 1 to write Paimon
Stop passing the current task
Modify the parallelism to be greater than 1
Restore from the last checkpoint

What doesn't meet your expectations?

The job should be able to resume normally from the last checkpoint or savepoint, even if I change the parallelism of the sink

Anything else?

No response

Are you willing to submit a PR?

[ ] I'm willing to submit a PR!

Apr 18 '24 07:04 huyuanfeng2018

        SingleOutputStreamOperator<?> committed =
                written.rebalance().transform(
                                GLOBAL_COMMITTER_NAME,
                                new MultiTableCommittableTypeInfo(),
                                new CommitterOperator<>(
                                        streamingCheckpointEnabled,
                                        commitUser,
                                        createCommitterFactory(),
                                        createCommittableStateManager()))
                        .setParallelism(1)
                        .setMaxParallelism(1);

This can be avoided by adding rebalance before the commit operator, because the parallelism of commit is always 1. When the write parallelism is 1, flink will chain the two operators. When the parallelism of writing increases, will be split

Apr 18 '24 07:04 huyuanfeng2018

@huyuanfeng2018 this is a good issue, but a chain committer can reduce resource cost.

maybe we can have an option to control this.

Apr 30 '24 11:04 JingsongLi

@huyuanfeng2018 this is a good issue, but a chain committer can reduce resource cost.

maybe we can have an option to control this.

+1. Agree

May 06 '24 02:05 huyuanfeng2018

paimon paimon copied to clipboard

[Bug] After the flink job written to Paimon modifies `sink.parallelism`, the job will not be able to recover from the checkpoint

Search before asking

Paimon version

Compute Engine

Minimal reproduce step

What doesn't meet your expectations?

Anything else?

Are you willing to submit a PR?

paimon
paimon copied to clipboard