Parquet column array<struct<>> with null value is read in as empty list
Apache Iceberg version
main (development)
Please describe the bug 🐞
An Iceberg table with column type array
reproducible scripts: https://github.com/puchengy/iceberg-python/commit/3fd6d3d3e4b237bda98e40c36bb07e7e4035c2f2
shows
> assert pyberg_val == direct_val
E assert [] == None
Great catch @puchengy, let me see what's needed to fix this
I've found the issue. We don't respect the null count when fetching the array through the accessor:
We just return the array and then create a new array with offset 1, and then it just injects a []
There is still an edge case unfixed. We need to wait for an upstream fix: https://github.com/apache/arrow/issues/38809
ref: https://github.com/apache/iceberg-python/pull/252#discussion_r1467065763
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Reply to re-activate the issue : )
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
i believe this issue is already address and we have a test for this https://github.com/apache/iceberg-python/blob/278f7643cd62f9e14496177632cb48d9b52e553d/dev/provision.py#L328
i also tested the example above https://github.com/puchengy/iceberg-python/commit/3fd6d3d3e4b237bda98e40c36bb07e7e4035c2f2