arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-48442: [Python] Remove workaround that excluded struct types from `chunked_arrays`

Open HyukjinKwon opened this issue 2 weeks ago • 10 comments

Rationale for this change

The chunked_arrays hypothesis strategy had a workaround that excluded struct types with the assumption that field metadata is not preserved (added from https://github.com/apache/arrow/commit/dd0988b49cb6726cf915bb9f53d7320e3a97b00b).

Testing confirms that field metadata is now correctly preserved in chunked arrays with struct types, so the workaround is no longer necessary, and it is fixed by https://github.com/apache/arrow/commit/d06c664a1966da682a2382e46fe148be96cca1aa

Now it explicitly calls CChunkedArray::Make() instead of manual construction of CChunkedArray.

What changes are included in this PR?

Remove the assumption that field metadata is not preserved.

Are these changes tested?

Manually tested the creation of metadata (generated by ChatGPT)

import sys
import pyarrow as pa

# Create a struct type with custom field metadata
struct_type = pa.struct([
    pa.field('a', pa.int32(), metadata={'custom_key': 'custom_value_a', 'description': 'field a'}),
    pa.field('b', pa.string(), metadata={'custom_key': 'custom_value_b', 'description': 'field b'})
])

print("=== Original struct type ===")
print(f"Type: {struct_type}")
print(f"Field 'a' metadata: {struct_type[0].metadata}")
print(f"Field 'b' metadata: {struct_type[1].metadata}")
print()

# Create arrays with this struct type
arr1 = pa.array([
    {'a': 1, 'b': 'foo'},
    {'a': 2, 'b': 'bar'}
], type=struct_type)

arr2 = pa.array([
    {'a': 3, 'b': 'baz'},
    {'a': 4, 'b': 'qux'}
], type=struct_type)

print("=== Individual arrays ===")
print(f"arr1.type: {arr1.type}")
print(f"arr1.type[0].metadata: {arr1.type[0].metadata}")
print(f"arr2.type: {arr2.type}")
print(f"arr2.type[0].metadata: {arr2.type[0].metadata}")
print()

# Create chunked array WITH explicit type parameter (preserves metadata)
chunked_with_type = pa.chunked_array([arr1, arr2], type=struct_type)

print("=== Chunked array (with explicit type) ===")
print(f"Type: {chunked_with_type.type}")
print(f"Field 'a' metadata: {chunked_with_type.type[0].metadata}")
print(f"Field 'b' metadata: {chunked_with_type.type[1].metadata}")
print()

# Verify metadata is preserved
if (chunked_with_type.type[0].metadata == struct_type[0].metadata and
    chunked_with_type.type[1].metadata == struct_type[1].metadata):
    print("✓ SUCCESS: Field metadata IS preserved!")
    print(f"  Field 'a': {dict(chunked_with_type.type[0].metadata)}")
    print(f"  Field 'b': {dict(chunked_with_type.type[1].metadata)}")
    exit_code = 0
else:
    print("✗ FAILED: Field metadata was lost")
    exit_code = 1

print()
print("=== Test without explicit type (for comparison) ===")
# What happens without explicit type? (inferred from first chunk)
chunked_without_type = pa.chunked_array([arr1, arr2])
print(f"Type: {chunked_without_type.type}")
print(f"Field 'a' metadata: {chunked_without_type.type[0].metadata}")
print(f"Field 'b' metadata: {chunked_without_type.type[1].metadata}")

if chunked_without_type.type[0].metadata == struct_type[0].metadata:
    print("  → Metadata preserved even without explicit type (from first chunk)")
else:
    print("  → Note: Even without explicit type, metadata is preserved from first chunk")

Are there any user-facing changes?

No, test-only.

  • GitHub Issue: #48442

HyukjinKwon avatar Dec 11 '25 02:12 HyukjinKwon

:warning: GitHub issue #48442 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Dec 11 '25 02:12 github-actions[bot]

@github-actions crossbow submit test-conda-python-3.11-hypothesis

raulcd avatar Dec 11 '25 09:12 raulcd

Revision: 1c29350e1dc43dbcfaa35efa83c1fc1f4448733b

Submitted crossbow builds: ursacomputing/crossbow @ actions-dd158bff76

Task Status
test-conda-python-3.11-hypothesis GitHub Actions

github-actions[bot] avatar Dec 11 '25 09:12 github-actions[bot]

It's rather a bandaid fix but https://github.com/apache/arrow/pull/48460 should fix it! 👍

HyukjinKwon avatar Dec 12 '25 02:12 HyukjinKwon

@github-actions crossbow submit test-conda-python-3.11-hypothesis

HyukjinKwon avatar Dec 12 '25 21:12 HyukjinKwon

Revision: 1c29350e1dc43dbcfaa35efa83c1fc1f4448733b

Submitted crossbow builds: ursacomputing/crossbow @ actions-bc8ad81dd8

Task Status
test-conda-python-3.11-hypothesis GitHub Actions

github-actions[bot] avatar Dec 12 '25 21:12 github-actions[bot]

@github-actions crossbow submit test-conda-python-3.11-hypothesis

HyukjinKwon avatar Dec 12 '25 22:12 HyukjinKwon

Revision: ec6acb2979e6cd831af147d119848db43953e2c7

Submitted crossbow builds: ursacomputing/crossbow @ actions-768e6f52a4

Task Status
test-conda-python-3.11-hypothesis GitHub Actions

github-actions[bot] avatar Dec 12 '25 22:12 github-actions[bot]

Pushed again to retrigger the test. hyphothsis build itself passes (https://github.com/apache/arrow/pull/48443#issuecomment-3648388322)

HyukjinKwon avatar Dec 12 '25 22:12 HyukjinKwon

Seems like:

tests/test_extension_type.py .................                           [ 40%]
Fatal Python error: Segmentation fault

Current thread 0x0000000203059040 (most recent call first):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pyarrow/tests/test_fs.py", line 1224 in test_s3_options
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line 166 in pytest_pyfunc_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line 1720 in runtest
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 245 in <lambda>
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 353 in from_call
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 244 in call_and_report
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 137 in runtestprotocol
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 396 in pytest_runtestloop
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 372 in _main
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 318 in wrap_session
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 365 in pytest_cmdline_main
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 199 in main
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 223 in console_main
  File "/Users/runner/hostedtoolcache/Python/3.11.9/arm64/bin/pytest", line 7 in <module>
tests/test_fs.py ....sssx.xsss....sssx.xssss

Failure at MacOS is globally happening. I retriggered but still the issue persists. Let me leave it as is for now - it won't be related to my change in any event.

HyukjinKwon avatar Dec 12 '25 23:12 HyukjinKwon

Thanks @adamreeve !

HyukjinKwon avatar Dec 18 '25 06:12 HyukjinKwon