starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[BugFix] Fix incorrect column mapping for Hive View when underlying table schema changes

Open nancodex opened this issue 2 weeks ago • 6 comments

Problem: When querying a Hive View (especially those defined as SELECT *), if the underlying Hive table schema has changed (e.g., new columns added), the column index in the View definition might differ from the underlying table. StarRocks previously used index-based mapping, causing filters (like partition pruning) to be applied to the wrong columns.

Solution: In QueryAnalyzer, introduced a name-based mapping mechanism for Hive Views.

  1. Check if it is a Hive View.
  2. Verify if all columns in the View's base schema exist in the underlying query output by name.
  3. If they match, use the column name to map the fields instead of the index.

Fixes: #66559

Why I'm doing:

When querying a Hive View (especially those defined as SELECT *), if the underlying Hive table schema has changed (e.g., new columns added), the column index in the View definition might differ from the underlying table. StarRocks previously used index-based mapping, causing filters (like partition pruning) to be applied to the wrong columns, resulting in empty results or errors.

What I'm doing:

In QueryAnalyzer, I introduced a name-based mapping mechanism for Hive Views.

  1. Check if the relation is a Hive View.
  2. Verify if all columns in the View's base schema exist in the underlying query output by name.
  3. If they match, use the column name to map the fields instead of the index. This ensures correct column mapping even if the underlying table structure changes.

Fixes #66559

What type of PR is this:

  • [x] BugFix
  • [ ] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [x] Yes, this PR will result in a change in behavior.
  • [ ] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [x] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [ ] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 4.0
    • [x] 3.5
    • [x] 3.4
    • [x] 3.3

[!NOTE] Switch to name-based field mapping for Hive views (when all columns match) to align view schema with query output; fallback to index mapping otherwise.

  • Analyzer (fe/fe-core/src/main/java/com/starrocks/sql/analyzer/QueryAnalyzer.java)
    • For visitView on Hive views, build a case-insensitive map of query output fields by name and, when all base-schema columns exist by name, map Field using names instead of indices.
    • Preserve original index-based mapping as fallback when name matching isn’t complete.
    • No changes to non-Hive views or other analysis paths.

Written by Cursor Bugbot for commit 03f6013193166bc5f69dbc752cfd1cd2e46cfaec. This will update automatically on new commits. Configure here.

nancodex avatar Dec 10 '25 14:12 nancodex

🧪 CI Insights

Here's what we observed from your CI run for 03f60131.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 10 '25 14:12 mergify[bot]

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 10 '25 17:12 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 10 '25 17:12 github-actions[bot]

[FE Incremental Coverage Report]

:white_check_mark: pass : 11 / 11 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/sql/analyzer/QueryAnalyzer.java 11 11 100.00% []

github-actions[bot] avatar Dec 10 '25 17:12 github-actions[bot]

@cursor review

alvin-celerdata avatar Dec 10 '25 17:12 alvin-celerdata