iceberg-python
iceberg-python copied to clipboard
Pyarrow data type, default to small type and fix large type override
Rationale for this change
#1669 made the change to infer the type when reading, and not default pyarrow data types to the large type. Originally, default to large type was introduced by #986.
I found a bug in #1669 where type promotion from string->binary defaults to large_binary (https://github.com/apache/iceberg-python/pull/1669#discussion_r2017223767). Which led to to find that we still use large type in _ConvertToArrowSchema. Furthermore, I found that we did not respect PYARROW_USE_LARGE_TYPES_ON_READ=True when reading.
This PR is a continuation of #1669.
- Change docs for
pyarrow.use-large-types-on-readto default valueFalse - Change
_ConvertToArrowSchemato use small data type instead of large - When
PYARROW_USE_LARGE_TYPES_ON_READis enabled (set toTrue),ArrowScanandArrowProjectionVisitorand should cast to large type - Add back test for setting
PYARROW_USE_LARGE_TYPES_ON_READtoTrue
This PR should help us infer the data type when reading while keeping the PYARROW_USE_LARGE_TYPES_ON_READ override behavior until deprecation.
Are these changes tested?
Yes
Are there any user-facing changes?
No