paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[flink] support combined mode for orphan files clean

Open XiaoHongbo-Hope opened this issue 2 months ago • 4 comments

This PR

Supports combined mode processing when orphan files clean:

  • Processes multiple tables within a single DataStream during job graph construction, instead of creating one DataStream per table, which significantly reduces JobGraph construction time and complexity, avoiding timeout, stack overflow, and resource allocation failures
  • Only applies when --mode combined is specified

Adds configuration:

  • [--mode <divided|combined>]: Processing mode (default: divided)
    • divided: Create one DataStream per table (original behavior)
    • combined: Process all tables in a single DataStream
  • [--tables <table1>] [--tables <table2>]: multiple parameters for table names
  • --table and --tables cannot be used together

Tests:

  • testCombinedMode: Combined mode with multiple tables
  • testCombinedModeWithBranch: Combined mode with multiple branches

XiaoHongbo-Hope avatar Nov 06 '25 17:11 XiaoHongbo-Hope

@yuzelin @jerry-024 @JingsongLi Can you please help review this PR.

XiaoHongbo-Hope avatar Nov 12 '25 11:11 XiaoHongbo-Hope

I think it's better to use one flink datastream to handle all tables. For example, you can add a --tables a,b,c , and it means clean table a, b and c; If not specified, get tables from catalog by old style arguments.

yuzelin avatar Nov 13 '25 06:11 yuzelin

Is combine mode any bad case? Why not just enable it by default

JingsongLi avatar Nov 14 '25 07:11 JingsongLi

Is combine mode any bad case? Why not just enable it by default

Sure,we can enable it by default.

XiaoHongbo-Hope avatar Nov 14 '25 10:11 XiaoHongbo-Hope

Is combine mode any bad case? Why not just enable it by default

Updated and tested in our application.

XiaoHongbo-Hope avatar Nov 17 '25 11:11 XiaoHongbo-Hope