deephaven-core icon indicating copy to clipboard operation
deephaven-core copied to clipboard

iceberg, unable to read empty table

Open devinrsmith opened this issue 1 year ago • 1 comments

Trying to read a table that has been created via the catalog, but doesn't have any snapshots, produces an NPE as opposed to an empty table (of the appropriate schema):

java.lang.NullPointerException: Cannot invoke "org.apache.iceberg.Snapshot.schemaId()" because "snapshot" is null
	at io.deephaven.iceberg.util.IcebergCatalogAdapter.readTableInternal(IcebergCatalogAdapter.java:524)
	at io.deephaven.iceberg.util.IcebergCatalogAdapter.readTable(IcebergCatalogAdapter.java:405)
	at io.deephaven.iceberg.util.IcebergCatalogAdapter.readTable(IcebergCatalogAdapter.java:419)

devinrsmith avatar Jul 30 '24 22:07 devinrsmith

The code to create an empty catalog was based on the java iceberg quickstart, https://iceberg.apache.org/docs/1.6.0/java-api-quickstart/#using-a-hadoop-catalog.

import org.apache.hadoop.conf.Configuration
import org.apache.iceberg.PartitionSpec
import org.apache.iceberg.Schema
import org.apache.iceberg.Table
import org.apache.iceberg.catalog.TableIdentifier
import org.apache.iceberg.hadoop.HadoopCatalog
import org.apache.iceberg.types.Types

// Adapted from https://iceberg.apache.org/docs/1.6.0/java-api-quickstart/#using-a-hadoop-catalog

Configuration conf = new Configuration()
String warehousePath = "file:///tmp/my_warehouse"
HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath)

Schema schema = new Schema(
    Types.NestedField.required(1, "level", Types.StringType.get()),
    Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
    Types.NestedField.required(3, "message", Types.StringType.get())
    // DH doesn't support LIST yet.
    // Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
)

PartitionSpec spec = PartitionSpec.builderFor(schema)
    .hour("event_time")
    .identity("level")
    .build()

TableIdentifier name = TableIdentifier.of("logging", "logs")
Table table = catalog.createTable(name, schema, spec)

produces these files

$ find /tmp/my_warehouse -type f
/tmp/my_warehouse/logging/logs/metadata/v1.metadata.json
/tmp/my_warehouse/logging/logs/metadata/.v1.metadata.json.crc
/tmp/my_warehouse/logging/logs/metadata/version-hint.text
/tmp/my_warehouse/logging/logs/metadata/.version-hint.text.crc

devinrsmith avatar Jul 31 '24 14:07 devinrsmith

Moved to https://deephaven.atlassian.net/browse/DH-18259

malhotrashivam avatar Dec 19 '24 19:12 malhotrashivam