iceberg-python
iceberg-python copied to clipboard
Writing an arrow table with date64 unsupported
Apache Iceberg version
0.6.0 (latest release)
Please describe the bug 🐞
TypeError: Unsupported type: date64[ms]
from decimal import Decimal
from pyiceberg.catalog.sql import SqlCatalog
import pyarrow as pa
pylist = [{'decimal_col': 1234}]
arrow_schema = pa.schema(
[
pa.field('decimal_col', pa.date64()),
],
)
arrow_table = pa.Table.from_pylist(pylist, schema=arrow_schema)
catalog = SqlCatalog(
'test_catalog',
**{
'type': "sql'",
'uri': 'sqlite:///pyiceberg.db',
},
)
namespace = 'test_ns'
table_name = 'test_table'
catalog.create_namespace(namespace=namespace)
new_table = catalog.create_table(
identifier=f'{namespace}.{table_name}',
schema=arrow_schema,
location='.',
)
new_table.append(arrow_table)
date32 is supported here
https://github.com/apache/iceberg-python/blob/a29491af52dc4aff46a325bbaac4a11c2f2bfabc/pyiceberg/io/pyarrow.py#L915-L916
likely need to add a new if-statement
@kevinjqliu Thanks! There might be other ones that are not supported. uint16 is also not supported while all of the other integer types are
I also created https://github.com/apache/iceberg-python/issues/837 which i found today as another bug when using pyiceberg to write
@kevinjqliu as part of this fix, would it be possible to also print out in the Exception what column is causing a problem? i.e 'decimal_col
Should I create a new issue to track this feature request?
Alternatively, return an more specific exception such as UnsupportedPyArrowType and include the pyarrow.Field (column_name, column_type) in the exception?
as part of this fix, would it be possible to also print out in the Exception what column is causing a problem? i.e 'decimal_col Should I create a new issue to track this feature request?
Yea, that's a great idea. I'm in favor of opening a new issue to track the qualify of life improvement for the error message.
The problem is that Parquet will encode a date as an int32. Adding the if would probably push the issue down, into the parquet writer. I'm suprised to see this, since a date with int32 has quite a bit of range:
As part of this fix, would it be possible to also print out in the Exception what column is causing a problem? i.e 'decimal_col
That's a great idea! 🙌
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'