[flink] support combined mode for orphan files clean
This PR
Supports combined mode processing when orphan files clean:
- Processes multiple tables within a single DataStream during job graph construction, instead of creating one DataStream per table, which significantly reduces JobGraph construction time and complexity, avoiding timeout, stack overflow, and resource allocation failures
- Only applies when
--mode combinedis specified
Adds configuration:
[--mode <divided|combined>]: Processing mode (default:divided)divided: Create one DataStream per table (original behavior)combined: Process all tables in a single DataStream
[--tables <table1>] [--tables <table2>]: multiple parameters for table names--tableand--tablescannot be used together
Tests:
testCombinedMode: Combined mode with multiple tablestestCombinedModeWithBranch: Combined mode with multiple branches
@yuzelin @jerry-024 @JingsongLi Can you please help review this PR.
I think it's better to use one flink datastream to handle all tables. For example, you can add a --tables a,b,c , and it means clean table a, b and c; If not specified, get tables from catalog by old style arguments.
Is combine mode any bad case? Why not just enable it by default
Is combine mode any bad case? Why not just enable it by default
Sure,we can enable it by default.
Is combine mode any bad case? Why not just enable it by default
Updated and tested in our application.