optimize(fuse): record scalar column in meta file(or parquet meta)?.
Summary
For a row in a large wide table, many (even most) columns may be null or set to their default values. This table might be loaded using SQL commands like COPY INTO wide_table(c1, c100) FROM ..., while wide_table itself may contain 1000 columns.
In memory, the unused columns are represented as Value::Scalar in DataBlock, which speeds up computation significantly. However, when we translate DataBlock into an Arrow RowBatch, it gets flattened. This results in:
- Slower load progress.
- When we read the data back, it is represented as Value::Column.
Impact
- The flattening process during the conversion to Arrow RowBatch introduces performance overhead, causing slower load times.
- The conversion of unused columns from Value::Scalar to Value::Column during read-back operations can negatively impact performance and resource usage.
cc @dantengsky @zhyass
For a row in a large wide table, many (even most) columns may be null or set to their default values. This table might be loaded using SQL commands like COPY INTO wide_table(c1, c100) FROM ...,
It looks like 'alter table t add column c int' or 'alter table t add column c int default 1',
maybe we need not to "materialize" those columns at all?
yes