starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] Implement Support for Iceberg Variant Format

Open xxubai opened this issue 10 months ago • 5 comments

Feature request

Is your feature request related to a problem? Please describe.
The current system does not support the Iceberg Variant format at all, particularly for reading data. While not using the Variant format may not cause data loss, it is crucial to support this new type introduced in Iceberg to ensure full compatibility and functionality.

Describe the solution you'd like
We would like to implement support for the Iceberg Variant format within the existing system, starting with the ability to read data in this format. This could be achieved by adding dedicated parsing logic to the data reading module to ensure proper interpretation and loading of data while maintaining compatibility with other formats. In the future, we may consider extending this support to include writing capabilities or more advanced data operations.

Describe alternatives you've considered
The current workaround involves converting data from the Iceberg Variant format into a supported format. Although this approach might not lead to data distortion, it does not take full advantage of the benefits provided by the Variant format, such as efficient handling of semi-structured data and enhanced performance.

Additional context
The Iceberg Variant format is designed for efficient reading and storage of semi-structured data. Supporting this format is not just about avoiding potential issues; it's about embracing the new type introduced in Iceberg to improve our system’s flexibility, scalability, and overall efficiency.

Related Issue
This feature request is aligned with the discussion in the Apache Iceberg community: apache/iceberg#10392.

Proposal

https://docs.google.com/document/d/14jCfihtWXs9ZTXTnmJGg_MDl4T5Th6xT25-6hbYDMpg/edit?usp=sharing

Tasks

  • [x] https://github.com/StarRocks/starrocks/pull/60189
  • [x] https://github.com/StarRocks/starrocks/pull/61099
  • [x] https://github.com/StarRocks/starrocks/pull/64126
  • [ ] https://github.com/StarRocks/starrocks/issues/62061
  • [x] https://github.com/StarRocks/starrocks/pull/63639
  • [x] https://github.com/StarRocks/starrocks/pull/65284
  • [ ] https://github.com/StarRocks/starrocks/pull/66539

xxubai avatar Feb 17 '25 06:02 xxubai

@XBaith Great to see that you are willing to contribute to the community. First of all, to support a new type is not an easy thing, especially when there is no corresponding type in StarRocks itself. I would suggest that we need to agree on the design before starting coding.

For now, StarRocks doesn't have the Variant type. So the design doc should cover at least the following parts.

  1. Introduce a new type of Variant. How to define a new type when creating tables. The display of values of this type and how to generate values from other types.
  2. Design the memory layout in the execution engine
  3. Design The read path in the Iceberg reader.

alvin-celerdata avatar Feb 17 '25 11:02 alvin-celerdata

We could maybe have an option (perhaps as temporary stopgap) to map Variant to JSON type (maybe via cast?) which would be simpler, it can cover many common usecases (although Variant has more types). I do agree having a native variant type in SR is the right goal.

Samrose-Ahmed avatar Feb 24 '25 21:02 Samrose-Ahmed

We could maybe have an option (perhaps as temporary stopgap) to map Variant to JSON type (maybe via cast?) which would be simpler, it can cover many common usecases (although Variant has more types). I do agree having a native variant type in SR is the right goal.

Yeah, our current focus is on supporting reading and writing Iceberg's Variant type in StarRocks. Implementing a native Variant type will be part of the next phase.

xxubai avatar Mar 04 '25 07:03 xxubai

Will we be able to cast a variant type to a struct in the same way we can with JSON types now?

I.e

CAST(variant as STRUCT<`x` int, `y` int>)

kyle-goodale-klaviyo avatar Dec 02 '25 16:12 kyle-goodale-klaviyo

Will we be able to cast a variant type to a struct in the same way we can with JSON types now?

I.e

CAST(variant as STRUCT<`x` int, `y` int>)

Sorry for the late reply @kyle-goodale-klaviyo . Variant type columns can be casted to struct columns. you can check it in https://github.com/StarRocks/starrocks/pull/66539

xxubai avatar Dec 10 '25 14:12 xxubai