Spec: add variant type
Help: #10392
Spec: add variant type
Proposal: https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit
This is to layout the spec for variant type. The specs are placed in Parquet project (see variant spec and shredding spec.
cc @rdblue, @RussellSpitzer and @flyrain
I do want to make sure we don't do a hostile fork here of the spec from Spark so let's make sure we get support from them to move the spec here before we merge. At the same time we should start going through wordings and continue to discuss the specs. I still think that would be easier to do in a public Google Doc though than in Github IMHO.
I do want to make sure we don't do a hostile fork here of the spec from Spark so let's make sure we get support from them to move the spec here before we merge. At the same time we should start going through wordings and continue to discuss the specs. I still think that would be easier to do in a public Google Doc though than in Github IMHO.
Definitely. It's not for merge yet. I'm mostly trying to get the comments in place. Make sense to move that to google doc and link here.
This needs some notes in Partition Transforms , I think explicitly we should disallow identity
For Appendix B - We should define something or state explicitly we don't define it for variant.
Appendix C - We'll need some details on the JSON serialization since that's going to have to define some string representations I think
Under Sort Orders we should probably note you cannot sort on a Variant?
Appendix D: Single Value Serialzation needs an entry, we can probably right "Not SUpported" for now, Json needs an entry
And an entry https://github.com/apache/iceberg/blob/main/format/spec.md#parquet
This needs some notes in
Partition Transforms, I think explicitly we should disallow identityFor Appendix B - We should define something or state explicitly we don't define it for variant.
Appendix C - We'll need some details on the JSON serialization since that's going to have to define some string representations I think
Under Sort Orders we should probably note you cannot sort on a Variant?
Appendix D: Single Value Serialzation needs an entry, we can probably right "Not SUpported" for now, Json needs an entry
Thanks @RussellSpitzer I missed those sections and just updated.
I mark Partition Transforms, sorting and hashing not supported/allowed for now. For Appendix C, I think it should be just variant, similar to primitive type, since it's Iceberg schema as I understand the section.
@aihuaxu, I think there are a couple of things missing:
- The Avro appendix should be updated to state that a Variant is stored as a Record with two fields, a required binary
metadataand a required binaryvalue. Shredding is not supported in Avro. - The ORC appendix should be updated to state that a Variant is stored as a struct with two fields, a required binary
metadataand a required binaryvalue. Type attribute should beiceberg.struct-type=variant. Shredding is not supported in ORC.
Oops. I didn't mean to close this.
@aihuaxu, I think there are a couple of things missing:
- The Avro appendix should be updated to state that a Variant is stored as a Record with two fields, a required binary
metadataand a required binaryvalue. Shredding is not supported in Avro.- The ORC appendix should be updated to state that a Variant is stored as a struct with two fields, a required binary
metadataand a required binaryvalue. Type attribute should beiceberg.struct-type=variant. Shredding is not supported in ORC.
Thanks @rdblue I thought we will make changes when we start to work on Avro/ORC. I added that.
I don't have much context for Json conversion. Not sure if we need to add more info.
I believe I have addressed the comments and can we move forward to merge the PR? Let me know if I miss anything. cc @RussellSpitzer and @rdblue
Merged, thanks everyone for your thoughtful feedback. Thanks @aihuaxu for the pr. Thanks to @rdblue , @emkornfield , @findepi , @XBaith , and @flyrain for your contributions reviewing.
Merged, thanks everyone for your thoughtful feedback. Thanks @aihuaxu for the pr. Thanks to @rdblue , @emkornfield , @findepi , @XBaith , and @flyrain for your contributions reviewing.
Thanks a lot for the discussion, and thanks for merging @RussellSpitzer .