starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] support recursive cte

Open Seaven opened this issue 1 month ago • 14 comments

Why I'm doing:

What I'm doing:

1. Grammar Support (ANTLR Grammar Definition)

  • fe/fe-grammar:
    • Add RECURSIVE keyword to WITH clause grammar rule
    • StarRocks.g4: WITH RECURSIVE? commonTableExpression
    • Add RECURSIVE to non-reserved keywords list

2. AST Data Structure Extensions

  • CTERelation:
    • Add isRecursive field to mark recursive CTE
    • Add isAnchor field to mark anchor part (initial query)
    • Provide corresponding getter/setter methods
  • QueryRelation: Add hasRecursiveCTE flag
  • SelectRelation: Support recursive CTE flag propagation
  • AstTraverser: Add visitCTE visitor method

3. Recursive CTE Validation and Execution

  • RecursiveCTEAstCheck (new file):

    • Detect presence of recursive CTE in statement
    • Validate recursive CTE legality (no multi-level recursion support)
    • Mark recursive references during AST traversal
  • RecursiveCTEExecutor (new file):

    • Core execution engine for iterative query processing
    • Manage working tables and result sets
    • Implement UNION ALL iterative merge logic
    • Handle termination condition detection (stop when result set unchanged)

4. Query Processing Pipeline Integration

  • StmtExecutor: Invoke recursive CTE detection and execution before SQL execution
  • QueryAnalyzer: Enhance CTE analysis logic with recursive semantic validation
  • AstBuilder: Parse WITH RECURSIVE syntax and construct CTERelation objects
  • AST2StringVisitor: Support recursive CTE string representation

5. Session Configuration

  • SessionVariable: Add enable_recursive_cte session variable (optional enable/disable)

6.Recursive CTE Execution Model

  1. Detection: RecursiveCTEAstCheck identifies recursive CTE presence
  2. Validation: Check recursive CTE legality (single-level recursion, correct UNION ALL structure)
  3. Execution:
    • First iteration: Execute anchor query (WHERE manager_id IS NULL)
    • Iterative loop:
      • Execute recursive part using previous iteration results
      • Merge results via UNION ALL
      • Stop when no new rows added
  4. Return: Merged result set from all iterations

Example

MySQL td> with recursive t1 as (select cast(1 as bigint) as l union select l + 1 as l from t1 where l < 10 ) 
select * from t1
+---+
| l |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+---+
5 rows in set
Time: 0.477s

MySQL td> -- SQL: 
 WITH RECURSIVE org_hierarchy AS (
    SELECT 
        employee_id,
        name,
        manager_id,
        title, cast(1 as bigint) AS `level`,
        name AS path
    FROM employees
    WHERE manager_id IS NULL
    UNION ALL
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        e.title,
        oh.`level` + 1,
        CONCAT(oh.path, ' -> ', e.name) 
    FROM employees e
    INNER JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT 
    employee_id,
    name,
    title,`level`,
    path
FROM org_hierarchy;
+-------------+--------+---------------------+-------+-----------------------------------------+
| employee_id | name   | title               | level | path                                    |
+-------------+--------+---------------------+-------+-----------------------------------------+
| 1           | Alicia | CEO                 | 1     | Alicia                                  |
| 2           | Bob    | CTO                 | 2     | Alicia -> Bob                           |
| 3           | Carol  | CFO                 | 2     | Alicia -> Carol                         |
| 4           | David  | VP of Engineering   | 3     | Alicia -> Bob -> David                  |
| 5           | Eve    | VP of Research      | 3     | Alicia -> Bob -> Eve                    |
| 6           | Frank  | VP of Finance       | 3     | Alicia -> Carol -> Frank                |
| 7           | Grace  | Engineering Manager | 4     | Alicia -> Bob -> David -> Grace         |
| 8           | Heidi  | Tech Lead           | 4     | Alicia -> Bob -> David -> Heidi         |
| 9           | Ivan   | Research Manager    | 4     | Alicia -> Bob -> Eve -> Ivan            |
| 10          | Judy   | Senior Engineer     | 5     | Alicia -> Bob -> David -> Grace -> Judy |
+-------------+--------+---------------------+-------+-----------------------------------------+
10 rows in set
Time: 0.476s
MySQL td>

MySQL td> -- SQL: 
explain  WITH RECURSIVE org_hierarchy AS (-- 锚点查询:从CEO开始(manager_id IS NULL)
    SELECT 
        employee_id,
        name,
        manager_id,
        title, cast(1 as bigint) AS `level`,
        name AS path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL-- 递归查询:逐级向下查找
    
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        e.title,
        oh.`level` + 1,
        CONCAT(oh.path, ' -> ', e.name) 
    FROM employees e
    INNER JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
-- 4. 最终查询
SELECT 
    employee_id,
    name,
    title,`level`,
    path
FROM org_hierarchy;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                                                                                                                                                                                                  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Recursive CTE Name: org_hierarchy                                                                                                                                                                                                                                                                                               |
| Temporary Table: org_hierarchy_1764834889831                                                                                                                                                                                                                                                                                                      |
| Start Statement: SELECT anchar_1764834889831.*, 0 AS `_cte_level`                                                                                                                                                                                                                                                               |
| FROM (SELECT `td`.`employees`.`employee_id`, `td`.`employees`.`name`, `td`.`employees`.`manager_id`, `td`.`employees`.`title`, CAST(1 AS BIGINT) AS `level`, `td`.`employees`.`name` AS `path`                                                                                                                                  |
| FROM `td`.`employees`                                                                                                                                                                                                                                                                                                           |
| WHERE `td`.`employees`.`manager_id` IS NULL) `anchar_1764834889831`                                                                                                                                                                                                                                                             |
| Recursive Statement: SELECT `td`.`e`.`employee_id`, `td`.`e`.`name`, `td`.`e`.`manager_id`, `td`.`e`.`title`, `oh`.`level` + 1 AS `oh.level + 1`, concat(`oh`.`path`, ' -> ', `td`.`e`.`name`) AS `concat(oh.path, ' -> ', e.name)`                                                                                             |
| FROM `td`.`employees` AS `e` INNER JOIN (SELECT `td`.`org_hierarchy_1764834889831`.`employee_id`, `td`.`org_hierarchy_1764834889831`.`name`, `td`.`org_hierarchy_1764834889831`.`manager_id`, `td`.`org_hierarchy_1764834889831`.`title`, `td`.`org_hierarchy_1764834889831`.`level`, `td`.`org_hierarchy_1764834889831`.`path` |
| FROM `td`.`org_hierarchy_1764834889831`) `oh` ON `td`.`e`.`manager_id` = `oh`.`employee_id`                                                                                                                                                                                                                                     |
| --------------------------------------------------                                                                                                                                                                                                                                                                              |
| Outer Statement After Rewriting:                                                                                                                                                                                                                                                                                                |
| SELECT `org_hierarchy`.`employee_id`, `org_hierarchy`.`name`, `org_hierarchy`.`title`, `org_hierarchy`.`level`, `org_hierarchy`.`path`                                                                                                                                                                                          |
| FROM (SELECT `td`.`org_hierarchy_1764834889831`.`employee_id`, `td`.`org_hierarchy_1764834889831`.`name`, `td`.`org_hierarchy_1764834889831`.`manager_id`, `td`.`org_hierarchy_1764834889831`.`title`, `td`.`org_hierarchy_1764834889831`.`level`, `td`.`org_hierarchy_1764834889831`.`path`                                    |
| FROM `td`.`org_hierarchy_1764834889831`) `org_hierarchy`                                                                                                                                                                                                                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
14 rows in set
Time: 0.015s

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [x] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [ ] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [ ] I have checked the version labels which the pr will be auto-backported to the target branch
    • [ ] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

[!NOTE] Introduces full recursive CTE support (WITH RECURSIVE) including parsing, validation, iterative execution via temporary tables, session controls, and tests.

  • SQL/Parser/Formatter:
    • Add WITH RECURSIVE grammar and RECURSIVE keyword in ANTLR (StarRocks.g4, StarRocksLex.g4).
    • Parse and render recursive CTEs (parser/AstBuilder, formatter/AST2StringVisitor).
  • AST & Traversal:
    • Extend CTERelation with isRecursive/isAnchor; QueryRelation with hasRecursiveCTE; SelectRelation#setPredicate.
    • Update traverser to handle recursive CTE anchors (AstTraverser).
  • Analyzer:
    • Validate and analyze recursive CTEs in QueryAnalyzer (anchor/recursive parts, name uniqueness, types, ban ORDER BY/LIMIT in recursive CTE).
  • Execution Pipeline:
    • Integrate recursive CTE path in StmtExecutor (detect, explain, prepare, finalize).
    • Add RecursiveCTEAstCheck and RecursiveCTEExecutor to stage iterative execution using temporary tables and loop depth.
  • Session Variables:
    • Add enable_recursive_cte, recursive_cte_max_depth, recursive_cte_finalize_temporal_table with getters/setters (SessionVariable).
  • Optimizer Guard:
    • Reject planning of recursive CTEs in RelationTransformer.
  • Connector Parser:
    • Adjust Trino AstBuilder CTE construction to include new flags.
  • Tests:
    • Add analyzer and plan UTs for recursive CTEs and SQL tests (AnalyzeCTETest, RecursiveCTETest, test/sql/test_cte/T/test_recursive_cte).

Written by Cursor Bugbot for commit a03df4274b01e03c8929673eee88d244b67b6b4e. This will update automatically on new commits. Configure here.

Seaven avatar Nov 25 '25 08:11 Seaven

@cursor review

alvin-celerdata avatar Nov 25 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 26 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 02 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 03 '25 04:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 03 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 04 '25 17:12 alvin-celerdata

🧪 CI Insights

Here's what we observed from your CI run for a03df427.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 04 '25 17:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 05 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 08 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 10 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 16 '25 05:12 alvin-celerdata

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 16 '25 08:12 github-actions[bot]

[FE Incremental Coverage Report]

:x: fail : 224 / 339 (66.08%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/sql/ast/SelectRelation.java 0 2 00.00% [161, 162]
:large_blue_circle: com/starrocks/qe/StmtExecutor.java 3 17 17.65% [721, 722, 723, 724, 725, 727, 728, 729, 731, 732, 733, 734, 736, 1022]
:large_blue_circle: com/starrocks/qe/recursivecte/RecursiveCTEExecutor.java 103 192 53.65% [117, 119, 120, 121, 122, 124, 125, 127, 131, 132, 133, 135, 136, 137, 138, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 154, 155, 156, 157, 158, 159, 160, 162, 164, 165, 166, 167, 168, 171, 172, 176, 177, 179, 181, 182, 183, 184, 185, 186, 187, 188, 194, 195, 197, 198, 199, 200, 201, 204, 205, 206, 207, 208, 209, 212, 214, 215, 216, 221, 222, 223, 225, 226, 227, 228, 229, 230, 232, 233, 235, 237, 241, 266, 267, 287, 309, 334, 337, 344]
:large_blue_circle: com/starrocks/qe/SessionVariable.java 8 10 80.00% [2129, 2133]
:large_blue_circle: com/starrocks/sql/analyzer/QueryAnalyzer.java 71 78 91.03% [374, 399, 405, 409, 410, 417, 430]
:large_blue_circle: com/starrocks/qe/recursivecte/RecursiveCTEAstCheck.java 20 21 95.24% [58]
:large_blue_circle: com/starrocks/sql/optimizer/transformer/RelationTransformer.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/ast/CTERelation.java 7 7 100.00% []
:large_blue_circle: com/starrocks/sql/ast/AstTraverser.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/formatter/AST2StringVisitor.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/ast/QueryRelation.java 6 6 100.00% []

github-actions[bot] avatar Dec 16 '25 08:12 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 16 '25 08:12 github-actions[bot]