iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

OpenAPI: Add ContentFile types to spec for the PreplanTable and PlanTable API

Open geruh opened this issue 1 year ago • 2 comments

Let's split the REST API changes from the implementation, since there is some overlap with the rest scan API. We are now serializing DataFiles using the ContentFileParser and passing them to the service. ContentFileParser requires passing additional information like the Schema and PartitionSpec of the table to successfully round trip the updates and apply them to a table. With this approach, the updates will follow this structure:

{
  "spec-id": 0,
  "content": "DATA",
  "file-path": "/path/to/data-with-stats.parquet",
  "file-format": "PARQUET",
  "partition": {
    "1000": 1
  },
  "file-size-in-bytes": 350,
  "record-count": 10,
  "column-sizes": {
    "keys": [3,4],
    "values": [100,200]
  },
  "value-counts": {
    "keys": [3,4],
    "values": [90,180]
  },
  "null-value-counts": {
    "keys": [3,4],
    "values": [10,20]
  },
  "nan-value-counts": {
    "keys": [3,4],
    "values": [0,0]
  },
  "lower-bounds": {
    "keys": [3,4],
    "values": ["01000000","02000000"]
  },
  "upper-bounds": {
    "keys": [3,4],
    "values": ["05000000","0A000000"]
  },
  "key-metadata": "00000000000000000000000000000000",
  "split-offsets": [128,256],
  "sort-order-id": 1
}

Furthermore, this change includes a dependency on the models for the SIngleValueParser which is used to serialize the values of the partition spec. So the models follow how the toJson method write the data.

with the proposed spec we now get:

  • partition field as a list of primitives

These model changes are stemming from the conversation here: https://github.com/apache/iceberg/pull/9292 cc @jackye1995 @rdblue @danielcweeks @rahil-c

Testing

OpenAPI: ran make install, make lint, make generate

geruh avatar Feb 12 '24 21:02 geruh

Will take a look at this PR tomorrow morning @geruh @jackye1995 !

amogh-jahagirdar avatar Feb 13 '24 05:02 amogh-jahagirdar

Mostly looks good to me. I flagged a couple of minor things.

Also, I don't think that we resolved this thread: https://github.com/apache/iceberg/pull/9717#discussion_r1501928527

rdblue avatar Feb 27 '24 21:02 rdblue

This looks good to me now. Since we have 3 approvals, I'll go ahead and merge it.

rdblue avatar Feb 28 '24 16:02 rdblue