iceberg-python Writing to a table with an optional map field fails if data is missing map field

Question

I am not sure if this is a bug or things working as expected, but when writing to a table with an optional map field, if the input data is missing that field entirely, the write will fail. This is because even though the map is optional, the map key is required. This does not happen with other types.

Here's a script to reproduce:

from pyiceberg.catalog import load_catalog
import pyarrow as pa

warehouse_path = "/tmp/warehouse"
catalog = load_catalog(
    "default",
    **{
        'type': 'sql',
        "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
        "warehouse": f"file://{warehouse_path}",
    },
)

catalog.create_namespace_if_not_exists("test")

schema = pa.schema({
    "id": pa.int64(),
    "text": pa.string(),
    "map": pa.map_(pa.string(), pa.string())
})

table = catalog.create_table_if_not_exists("test.table", schema)

table.append(pa.Table.from_pylist([{"id": 1}]))

This will throw:

│ ✅ │ 1: id: optional long                 │ 1: id: optional long │
│ ✅ │ 2: text: optional string             │ Missing              │
│ ✅ │ 3: map: optional map<string, string> │ Missing              │
│ ❌ │ 4: key: required string              │ Missing              │
│ ✅ │ 5: value: optional string            │ Missing              │

The solution I found is to cast the input data to the table schema when writing, but it's not always practical.

Nov 03 '25 20:11 dbnl-renaud

Good catch and thanks for the repro. The Map key is always required https://github.com/apache/iceberg-python/blob/5773b7f1bf2081a90a490f9d670eef804eb88ab4/pyiceberg/types.py#L582

https://github.com/apache/iceberg-python/blob/5773b7f1bf2081a90a490f9d670eef804eb88ab4/pyiceberg/schema.py#L1803

_is_field_compatible with Map key field will always enforce this as required

Nov 03 '25 20:11 kevinjqliu

hm gotta dig into this a little deeper, like you mentioned, casting the pa.Table with provided schema works

from pyiceberg.catalog import load_catalog
import pyarrow as pa

warehouse_path = "/tmp/warehouse"
catalog = load_catalog(
    "default",
    **{
        'type': 'sql',
        "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
        "warehouse": f"file://{warehouse_path}",
    },
)

catalog.create_namespace_if_not_exists("test")

schema = pa.schema({
    "id": pa.int64(),
    "text": pa.string(),
    "map": pa.map_(pa.string(), pa.string())
})

table = catalog.create_table_if_not_exists("test.table", schema)
print("table schema:", table.schema())
print()

data = pa.Table.from_pylist([{"id": 1}], schema=schema)
print("data schema:", data.schema)
print()
table.append(data)

Nov 03 '25 21:11 kevinjqliu