awkward icon indicating copy to clipboard operation
awkward copied to clipboard

pyarrow `RuntimeError: AppendRowGroups requires equal schemas`

Open ianna opened this issue 1 year ago • 3 comments

Version of Awkward Array

2.6.4 (master branch)

pyarrow.version '11.0.0'

pandas.version '2.0.3'

Description and code to reproduce

I do not see the error in CI, so I assume there must be a conflict in my environment, because when I run:

% python -m pytest tests/test_2898_to_parquet_dataset.py

I get a RuntimeError: AppendRowGroups requires equal schemas.

Indeed, my installed version of pyarrow is 11.0.0, while the CI installs 16.1.0. Should we set a minimum required version?

ianna avatar May 15 '24 10:05 ianna

It might be related to https://github.com/apache/arrow/issues/31678?

ianna avatar May 15 '24 11:05 ianna

Indeed, my installed version of pyarrow is 11.0.0, while the CI installs 16.1.0. Should we set a minimum required version?

There's a test that runs the minimal pyarrow version, which is 7.0.0:

https://github.com/scikit-hep/awkward/blob/3317b0c8f2656e47222d03a57bc2b14b1c56227d/.github/workflows/test.yml#L56-L59

https://github.com/scikit-hep/awkward/blob/3317b0c8f2656e47222d03a57bc2b14b1c56227d/requirements-test-minimal.txt#L3

So we are testing the minimal version. If there's an error that occurs for pyarrow 11 but not 7 or 16... that would be interesting. In that case, yes, we'd probably want to increase the minimum up to 16 or wherever we get a contiguous set of versions that pass.

jpivarski avatar May 15 '24 21:05 jpivarski

No error in tests/test_2898_to_parquet_dataset.py with pyarrow 10.0.1 on MacOS...

jpivarski avatar May 15 '24 21:05 jpivarski