OpenAPI: Add ContentFile types to spec for the PreplanTable and PlanTable API
Let's split the REST API changes from the implementation, since there is some overlap with the rest scan API. We are now serializing DataFiles using the ContentFileParser and passing them to the service. ContentFileParser requires passing additional information like the Schema and PartitionSpec of the table to successfully round trip the updates and apply them to a table. With this approach, the updates will follow this structure:
{
"spec-id": 0,
"content": "DATA",
"file-path": "/path/to/data-with-stats.parquet",
"file-format": "PARQUET",
"partition": {
"1000": 1
},
"file-size-in-bytes": 350,
"record-count": 10,
"column-sizes": {
"keys": [3,4],
"values": [100,200]
},
"value-counts": {
"keys": [3,4],
"values": [90,180]
},
"null-value-counts": {
"keys": [3,4],
"values": [10,20]
},
"nan-value-counts": {
"keys": [3,4],
"values": [0,0]
},
"lower-bounds": {
"keys": [3,4],
"values": ["01000000","02000000"]
},
"upper-bounds": {
"keys": [3,4],
"values": ["05000000","0A000000"]
},
"key-metadata": "00000000000000000000000000000000",
"split-offsets": [128,256],
"sort-order-id": 1
}
Furthermore, this change includes a dependency on the models for the SIngleValueParser which is used to serialize the values of the partition spec. So the models follow how the toJson method write the data.
with the proposed spec we now get:
- partition field as a list of primitives
These model changes are stemming from the conversation here: https://github.com/apache/iceberg/pull/9292 cc @jackye1995 @rdblue @danielcweeks @rahil-c
Testing
OpenAPI: ran make install, make lint, make generate
Will take a look at this PR tomorrow morning @geruh @jackye1995 !
Mostly looks good to me. I flagged a couple of minor things.
Also, I don't think that we resolved this thread: https://github.com/apache/iceberg/pull/9717#discussion_r1501928527
This looks good to me now. Since we have 3 approvals, I'll go ahead and merge it.