iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Writing an arrow table with date64 unsupported

Open vtk9 opened this issue 1 year ago • 5 comments

Apache Iceberg version

0.6.0 (latest release)

Please describe the bug 🐞

TypeError: Unsupported type: date64[ms]
from decimal import Decimal
from pyiceberg.catalog.sql import SqlCatalog
import pyarrow as pa

pylist = [{'decimal_col': 1234}]
arrow_schema = pa.schema(
    [
        pa.field('decimal_col', pa.date64()),
    ],
)
arrow_table = pa.Table.from_pylist(pylist, schema=arrow_schema)

catalog = SqlCatalog(
    'test_catalog',
    **{
        'type': "sql'",
        'uri': 'sqlite:///pyiceberg.db',
    },
)

namespace = 'test_ns'
table_name = 'test_table'

catalog.create_namespace(namespace=namespace)
new_table = catalog.create_table(
    identifier=f'{namespace}.{table_name}',
    schema=arrow_schema,
    location='.',
)

new_table.append(arrow_table)

vtk9 avatar Jun 18 '24 19:06 vtk9

date32 is supported here https://github.com/apache/iceberg-python/blob/a29491af52dc4aff46a325bbaac4a11c2f2bfabc/pyiceberg/io/pyarrow.py#L915-L916

likely need to add a new if-statement

kevinjqliu avatar Jun 19 '24 15:06 kevinjqliu

@kevinjqliu Thanks! There might be other ones that are not supported. uint16 is also not supported while all of the other integer types are

I also created https://github.com/apache/iceberg-python/issues/837 which i found today as another bug when using pyiceberg to write

vtk9 avatar Jun 19 '24 17:06 vtk9

@kevinjqliu as part of this fix, would it be possible to also print out in the Exception what column is causing a problem? i.e 'decimal_col

Should I create a new issue to track this feature request?

Alternatively, return an more specific exception such as UnsupportedPyArrowType and include the pyarrow.Field (column_name, column_type) in the exception?

vtk9 avatar Jun 26 '24 00:06 vtk9

as part of this fix, would it be possible to also print out in the Exception what column is causing a problem? i.e 'decimal_col Should I create a new issue to track this feature request?

Yea, that's a great idea. I'm in favor of opening a new issue to track the qualify of life improvement for the error message.

kevinjqliu avatar Jun 26 '24 05:06 kevinjqliu

The problem is that Parquet will encode a date as an int32. Adding the if would probably push the issue down, into the parquet writer. I'm suprised to see this, since a date with int32 has quite a bit of range:

image

As part of this fix, would it be possible to also print out in the Exception what column is causing a problem? i.e 'decimal_col

That's a great idea! 🙌

Fokko avatar Jun 26 '24 07:06 Fokko

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Dec 24 '24 00:12 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Jan 07 '25 00:01 github-actions[bot]