[Bug] Incorrect Column Mapping when Querying Hive View (defined as SELECT *) after Underlying Table Schema Change
Steps to reproduce the behavior (Required)
-
Create a Hive Table (ORC format, partitioned by
dayno)CREATE TABLE test_db.test_hive_table ( id BIGINT COMMENT 'ID', name STRING COMMENT 'Name', age INT COMMENT 'Age', city STRING COMMENT 'City', salary DECIMAL(10,2) COMMENT 'Salary', imei_to_id BIGINT COMMENT 'IMEI' ) COMMENT 'test table for starrocks bug verification' PARTITIONED BY (dayno INT COMMENT 'Partition Date') STORED AS ORC; -
Create a Hive View (defined with
SELECT *)-- Create view without specifying column list, relying on underlying table schema CREATE VIEW test_db.test_hive_view COMMENT 'View defined with SELECT *' AS SELECT * FROM test_db.test_hive_table; -
Insert Test Data
INSERT INTO test_db.test_hive_table PARTITION(dayno=20251209) VALUES (1, 'ZhangSan', 25, 'Beijing', 10000.50, 1001), (2, 'LiSi', 30, 'Shanghai', 15000.75, 1002), (3, 'WangWu', 28, 'Shenzhen', 12000.00, 1003), (4, 'ZhaoLiu', 35, 'Guangzhou', 18000.25, 1004), (5, 'SunQi', 40, 'Hangzhou', 20000.50, 1005); -
Alter Table Schema (Add a new column)
ALTER TABLE test_db.test_hive_table ADD COLUMNS ( new_field STRING COMMENT 'New added field' ); -
Query the View in StarRocks
SELECT * FROM hive.test_db.test_hive_view WHERE dayno = 20251209;
Expected behavior (Required)
StarRocks should correctly resolve the schema change in the underlying Hive table when querying the View (defined as SELECT *).
The query should apply the dayno filter correctly to the partition column and return the 5 inserted rows.
(Note: Spark and Trino handle this scenario correctly and return the expected data.)
Real behavior (Required)
The query returns 0 rows (or incorrect results).
Analyzing the EXPLAIN plan in StarRocks reveals that the partition column filter is incorrectly applied to a different column (the newly added column) as a non-partition predicate.
Explain output:
PLAN FRAGMENT 0
OUTPUT EXPRS:1: id | 2: name | 3: age | 4: city | 5: salary | 6: imei_to_id | 7: new_field
PARTITION: UNPARTITIONED
RESULT SINK
1:EXCHANGE
PLAN FRAGMENT 1
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 01
UNPARTITIONED
0:HdfsScanNode
TABLE: ods_test_hive_table
NON-PARTITION PREDICATES: CAST(7: new_field AS DECIMAL128(38,9)) = 20251209
partitions=1/1
cardinality=15
avgRowSize=7.0
Key Observations:
- Incorrect Column Mapping: The partition filter
dayno = 20251209is wrongly applied to the newly added columnnew_field.- Explain log:
NON-PARTITION PREDICATES: CAST(7: new_field AS DECIMAL128(38,9)) = 20251209
- Explain log:
- Incorrect Result: The query returns 0 rows because the partition filter is lost and
new_fielddoes not match the date value.
StarRocks version (Required)
- [4.0.1]
@murphyatwork Does this issue exist? If yes, I'll fix it.