starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Bug] Incorrect Column Mapping when Querying Hive View (defined as SELECT *) after Underlying Table Schema Change

Open nancodex opened this issue 2 weeks ago • 1 comments

Steps to reproduce the behavior (Required)

  1. Create a Hive Table (ORC format, partitioned by dayno)

    CREATE TABLE test_db.test_hive_table (
        id BIGINT COMMENT 'ID',
        name STRING COMMENT 'Name',
        age INT COMMENT 'Age',
        city STRING COMMENT 'City',
        salary DECIMAL(10,2) COMMENT 'Salary',
        imei_to_id BIGINT COMMENT 'IMEI'
    )
    COMMENT 'test table for starrocks bug verification'
    PARTITIONED BY (dayno INT COMMENT 'Partition Date')
    STORED AS ORC;
    
  2. Create a Hive View (defined with SELECT *)

    -- Create view without specifying column list, relying on underlying table schema
    CREATE VIEW test_db.test_hive_view 
    COMMENT 'View defined with SELECT *'
    AS 
    SELECT * FROM test_db.test_hive_table;
    
  3. Insert Test Data

    INSERT INTO test_db.test_hive_table PARTITION(dayno=20251209)
    VALUES 
    (1, 'ZhangSan', 25, 'Beijing', 10000.50, 1001),
    (2, 'LiSi', 30, 'Shanghai', 15000.75, 1002),
    (3, 'WangWu', 28, 'Shenzhen', 12000.00, 1003),
    (4, 'ZhaoLiu', 35, 'Guangzhou', 18000.25, 1004),
    (5, 'SunQi', 40, 'Hangzhou', 20000.50, 1005);
    
  4. Alter Table Schema (Add a new column)

    ALTER TABLE test_db.test_hive_table ADD COLUMNS (
        new_field STRING COMMENT 'New added field'
    );
    
  5. Query the View in StarRocks

    SELECT * FROM hive.test_db.test_hive_view WHERE dayno = 20251209;
    

Expected behavior (Required)

StarRocks should correctly resolve the schema change in the underlying Hive table when querying the View (defined as SELECT *). The query should apply the dayno filter correctly to the partition column and return the 5 inserted rows.

(Note: Spark and Trino handle this scenario correctly and return the expected data.)

Real behavior (Required)

The query returns 0 rows (or incorrect results).

Analyzing the EXPLAIN plan in StarRocks reveals that the partition column filter is incorrectly applied to a different column (the newly added column) as a non-partition predicate.

Explain output:

PLAN FRAGMENT 0
 OUTPUT EXPRS:1: id | 2: name | 3: age | 4: city | 5: salary | 6: imei_to_id | 7: new_field
 PARTITION: UNPARTITIONED

   RESULT SINK

   1:EXCHANGE

PLAN FRAGMENT 1
 OUTPUT EXPRS:
   PARTITION: RANDOM

   STREAM DATA SINK
     EXCHANGE ID: 01
     UNPARTITIONED

   0:HdfsScanNode
      TABLE: ods_test_hive_table
      NON-PARTITION PREDICATES: CAST(7: new_field AS DECIMAL128(38,9)) = 20251209
      partitions=1/1
      cardinality=15
      avgRowSize=7.0

Key Observations:

  1. Incorrect Column Mapping: The partition filter dayno = 20251209 is wrongly applied to the newly added column new_field.
    • Explain log: NON-PARTITION PREDICATES: CAST(7: new_field AS DECIMAL128(38,9)) = 20251209
  2. Incorrect Result: The query returns 0 rows because the partition filter is lost and new_field does not match the date value.

StarRocks version (Required)

  • [4.0.1]

nancodex avatar Dec 10 '25 08:12 nancodex

@murphyatwork Does this issue exist? If yes, I'll fix it.

nancodex avatar Dec 10 '25 08:12 nancodex