iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Parquet column array<struct<>> with null value is read in as empty list

Open puchengy opened this issue 2 years ago • 5 comments

Apache Iceberg version

main (development)

Please describe the bug 🐞

An Iceberg table with column type array that has null value is read in as empty list, however, it should be none instead.

reproducible scripts: https://github.com/puchengy/iceberg-python/commit/3fd6d3d3e4b237bda98e40c36bb07e7e4035c2f2

shows

>       assert pyberg_val == direct_val
E       assert [] == None

puchengy avatar Jan 04 '24 20:01 puchengy

Great catch @puchengy, let me see what's needed to fix this

Fokko avatar Jan 05 '24 12:01 Fokko

I've found the issue. We don't respect the null count when fetching the array through the accessor:

image

We just return the array and then create a new array with offset 1, and then it just injects a []

Fokko avatar Jan 05 '24 22:01 Fokko

There is still an edge case unfixed. We need to wait for an upstream fix: https://github.com/apache/arrow/issues/38809

ref: https://github.com/apache/iceberg-python/pull/252#discussion_r1467065763

HonahX avatar Jan 26 '24 07:01 HonahX

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Jul 25 '24 00:07 github-actions[bot]

Reply to re-activate the issue : )

HonahX avatar Jul 25 '24 06:07 HonahX

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Jan 22 '25 00:01 github-actions[bot]

i believe this issue is already address and we have a test for this https://github.com/apache/iceberg-python/blob/278f7643cd62f9e14496177632cb48d9b52e553d/dev/provision.py#L328

i also tested the example above https://github.com/puchengy/iceberg-python/commit/3fd6d3d3e4b237bda98e40c36bb07e7e4035c2f2

kevinjqliu avatar Mar 26 '25 16:03 kevinjqliu