Add support for pyarrow DurationType
Feature Request / Improvement
Currently a pa.Schema with a pa.DurationType can't be converted to an iceberg schema.
I think it should be treated the same way as a pa.Time64Type and be mapped to a time type in iceberg.
import pyarrow as pa
import pytest
from pyiceberg.catalog import Catalog
from pyiceberg.io.pyarrow import UnsupportedPyArrowTypeException
def test_iceberg_config():
pa_schema = pa.schema(
[
pa.field("timestamp", pa.timestamp("us", "UTC")),
pa.field("time", pa.time64("us")),
pa.field("duration", pa.duration("us")),
],
)
with pytest.raises(
UnsupportedPyArrowTypeException,
match=r"Column 'duration' has an unsupported type: duration\[us\]",
):
Catalog._convert_schema_if_needed(pa_schema)
@0x26res Thanks for raising this issue. From what I understand, a duration is different from a time. Could you elaborate how this would map onto time?
I guess in python a datetime.timedelta (aka duration) is like a datetime.time, except a timedelta value can be negative and be greater than a day.
In pyarrow, there isn't this constraint. You can create a time64 that represent more than 24 hours or that is negative. In that respect duration and time64, in pyarrow, are both an int 64, which associated with its unit ("us", "ns"...) can be interpreted to a logical type.
The spec on the time in iceberg are a bit loose:
Time of day, microsecond precision, without date, timezone
I guess we can either:
- have the library convert
pa.duration64to an icebergtimeby default - force the user to convert their
pa.duration('us')topa.time64('us')before hand, if their happy to interpret their duration as time. - add support for an explicit
durationtype in iceberg.
This was just formally proposed to the dev mailing list via https://docs.google.com/document/d/12ghQxWxyAhSQeZyy0IWiwJ02gTqFOgfYm8x851HZFLk/edit?tab=t.0#heading=h.rt0cvesdzsj7
I think wise to wait for this to be officially implemented before attempting to stick it into the time type
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'