iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

discussion: Simplify `PrimitiveLiteral` type.

Open liurenjie1024 opened this issue 1 year ago • 8 comments

We have introduce Datum in recent prs, which associate data type with a primitive literal. This means that PrimitiveLiteral is supposed to be a physical value, and we no longer need so many variants (such as date, time). More discussion could be found in https://github.com/apache/iceberg-rust/pull/492.

I'm here to propose to simplify it as following:

pub enum PrimitiveLiteral {
    /// 0x00 for false, non-zero byte for true
    Boolean(bool),
    /// Stored as 4-byte little-endian
    Int(i32),
    /// Stored as 8-byte little-endian
    Long(i64),
    /// Stored as 4-byte little-endian
    Float(OrderedFloat<f32>),
    /// Stored as 8-byte little-endian
    Double(OrderedFloat<f64>),
    /// UTF-8 bytes (without length)
    String(String),
    /// Binary value (without length)
    Binary(Vec<u8>),
    /// Stores unscaled value as big int. According to iceberg spec, the precision must less than 38(`MAX_DECIMAL_PRECISION`) , so i128 is suit here.
    Decimal(i128),
}

liurenjie1024 avatar Jul 29 '24 14:07 liurenjie1024

cc @Xuanwo @ZENOTME @Fokko @sdd

liurenjie1024 avatar Jul 29 '24 14:07 liurenjie1024

Thanks for this. I support this change.

Xuanwo avatar Jul 29 '24 14:07 Xuanwo

Also +1 for this and I'm glad to send the PR to do this later if we decide.

ZENOTME avatar Jul 29 '24 14:07 ZENOTME

This would make a lot of sense. I was also never keen on the names. Maybe we could rename Datum to LogicalValue and have PhysicalValue for primitive, similar to Parquet?

sdd avatar Jul 29 '24 15:07 sdd

Actually if we did that it would diverge from the Iceberg spec

sdd avatar Jul 29 '24 15:07 sdd

I create an init PR #502 for this. I hope it can be a starting point to discuss what's going on if we simplify PrimitiveLiteral. So feel free for any suggestions.

ZENOTME avatar Jul 29 '24 16:07 ZENOTME

This would make a lot of sense. I was also never keen on the names. Maybe we could rename Datum to LogicalValue and have PhysicalValue for primitive, similar to Parquet?

I prefer to reserve Primitive in name to indicate the primitive value, e.g LogicalPrimitive PhysicalPrimitive

ZENOTME avatar Jul 29 '24 16:07 ZENOTME

This would make a lot of sense. I was also never keen on the names. Maybe we could rename Datum to LogicalValue and have PhysicalValue for primitive, similar to Parquet?

I prefer to reserve Primitive in name to indicate the primitive value, e.g LogicalPrimitive PhysicalPrimitive

+1 for this. Also we may don't need to actually rename it, just adding an alias would be enough.

liurenjie1024 avatar Jul 30 '24 14:07 liurenjie1024

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Oct 02 '25 00:10 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Oct 16 '25 00:10 github-actions[bot]