GH-48442: [Python] Remove workaround that excluded struct types from `chunked_arrays`
Rationale for this change
The chunked_arrays hypothesis strategy had a workaround that excluded struct types with the assumption that field metadata is not preserved (added from https://github.com/apache/arrow/commit/dd0988b49cb6726cf915bb9f53d7320e3a97b00b).
Testing confirms that field metadata is now correctly preserved in chunked arrays with struct types, so the workaround is no longer necessary, and it is fixed by https://github.com/apache/arrow/commit/d06c664a1966da682a2382e46fe148be96cca1aa
Now it explicitly calls CChunkedArray::Make() instead of manual construction of CChunkedArray.
What changes are included in this PR?
Remove the assumption that field metadata is not preserved.
Are these changes tested?
Manually tested the creation of metadata (generated by ChatGPT)
import sys
import pyarrow as pa
# Create a struct type with custom field metadata
struct_type = pa.struct([
pa.field('a', pa.int32(), metadata={'custom_key': 'custom_value_a', 'description': 'field a'}),
pa.field('b', pa.string(), metadata={'custom_key': 'custom_value_b', 'description': 'field b'})
])
print("=== Original struct type ===")
print(f"Type: {struct_type}")
print(f"Field 'a' metadata: {struct_type[0].metadata}")
print(f"Field 'b' metadata: {struct_type[1].metadata}")
print()
# Create arrays with this struct type
arr1 = pa.array([
{'a': 1, 'b': 'foo'},
{'a': 2, 'b': 'bar'}
], type=struct_type)
arr2 = pa.array([
{'a': 3, 'b': 'baz'},
{'a': 4, 'b': 'qux'}
], type=struct_type)
print("=== Individual arrays ===")
print(f"arr1.type: {arr1.type}")
print(f"arr1.type[0].metadata: {arr1.type[0].metadata}")
print(f"arr2.type: {arr2.type}")
print(f"arr2.type[0].metadata: {arr2.type[0].metadata}")
print()
# Create chunked array WITH explicit type parameter (preserves metadata)
chunked_with_type = pa.chunked_array([arr1, arr2], type=struct_type)
print("=== Chunked array (with explicit type) ===")
print(f"Type: {chunked_with_type.type}")
print(f"Field 'a' metadata: {chunked_with_type.type[0].metadata}")
print(f"Field 'b' metadata: {chunked_with_type.type[1].metadata}")
print()
# Verify metadata is preserved
if (chunked_with_type.type[0].metadata == struct_type[0].metadata and
chunked_with_type.type[1].metadata == struct_type[1].metadata):
print("✓ SUCCESS: Field metadata IS preserved!")
print(f" Field 'a': {dict(chunked_with_type.type[0].metadata)}")
print(f" Field 'b': {dict(chunked_with_type.type[1].metadata)}")
exit_code = 0
else:
print("✗ FAILED: Field metadata was lost")
exit_code = 1
print()
print("=== Test without explicit type (for comparison) ===")
# What happens without explicit type? (inferred from first chunk)
chunked_without_type = pa.chunked_array([arr1, arr2])
print(f"Type: {chunked_without_type.type}")
print(f"Field 'a' metadata: {chunked_without_type.type[0].metadata}")
print(f"Field 'b' metadata: {chunked_without_type.type[1].metadata}")
if chunked_without_type.type[0].metadata == struct_type[0].metadata:
print(" → Metadata preserved even without explicit type (from first chunk)")
else:
print(" → Note: Even without explicit type, metadata is preserved from first chunk")
Are there any user-facing changes?
No, test-only.
- GitHub Issue: #48442
:warning: GitHub issue #48442 has been automatically assigned in GitHub to PR creator.
@github-actions crossbow submit test-conda-python-3.11-hypothesis
Revision: 1c29350e1dc43dbcfaa35efa83c1fc1f4448733b
Submitted crossbow builds: ursacomputing/crossbow @ actions-dd158bff76
| Task | Status |
|---|---|
| test-conda-python-3.11-hypothesis |
It's rather a bandaid fix but https://github.com/apache/arrow/pull/48460 should fix it! 👍
@github-actions crossbow submit test-conda-python-3.11-hypothesis
Revision: 1c29350e1dc43dbcfaa35efa83c1fc1f4448733b
Submitted crossbow builds: ursacomputing/crossbow @ actions-bc8ad81dd8
| Task | Status |
|---|---|
| test-conda-python-3.11-hypothesis |
@github-actions crossbow submit test-conda-python-3.11-hypothesis
Revision: ec6acb2979e6cd831af147d119848db43953e2c7
Submitted crossbow builds: ursacomputing/crossbow @ actions-768e6f52a4
| Task | Status |
|---|---|
| test-conda-python-3.11-hypothesis |
Pushed again to retrigger the test. hyphothsis build itself passes (https://github.com/apache/arrow/pull/48443#issuecomment-3648388322)
Seems like:
tests/test_extension_type.py ................. [ 40%]
Fatal Python error: Segmentation fault
Current thread 0x0000000203059040 (most recent call first):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/tests/test_fs.py", line 1224 in test_s3_options
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line 166 in pytest_pyfunc_call
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line 1720 in runtest
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 245 in <lambda>
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 353 in from_call
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 244 in call_and_report
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 137 in runtestprotocol
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 396 in pytest_runtestloop
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 372 in _main
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 318 in wrap_session
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 365 in pytest_cmdline_main
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 199 in main
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 223 in console_main
File "/Users/runner/hostedtoolcache/Python/3.11.9/arm64/bin/pytest", line 7 in <module>
tests/test_fs.py ....sssx.xsss....sssx.xssss
Failure at MacOS is globally happening. I retriggered but still the issue persists. Let me leave it as is for now - it won't be related to my change in any event.
Thanks @adamreeve !