pyarrow `RuntimeError: AppendRowGroups requires equal schemas`
Version of Awkward Array
2.6.4 (master branch)
pyarrow.version '11.0.0'
pandas.version '2.0.3'
Description and code to reproduce
I do not see the error in CI, so I assume there must be a conflict in my environment, because when I run:
% python -m pytest tests/test_2898_to_parquet_dataset.py
I get a RuntimeError: AppendRowGroups requires equal schemas.
Indeed, my installed version of pyarrow is 11.0.0, while the CI installs 16.1.0. Should we set a minimum required version?
It might be related to https://github.com/apache/arrow/issues/31678?
Indeed, my installed version of
pyarrowis11.0.0, while the CI installs16.1.0. Should we set a minimum required version?
There's a test that runs the minimal pyarrow version, which is 7.0.0:
https://github.com/scikit-hep/awkward/blob/3317b0c8f2656e47222d03a57bc2b14b1c56227d/.github/workflows/test.yml#L56-L59
https://github.com/scikit-hep/awkward/blob/3317b0c8f2656e47222d03a57bc2b14b1c56227d/requirements-test-minimal.txt#L3
So we are testing the minimal version. If there's an error that occurs for pyarrow 11 but not 7 or 16... that would be interesting. In that case, yes, we'd probably want to increase the minimum up to 16 or wherever we get a contiguous set of versions that pass.
No error in tests/test_2898_to_parquet_dataset.py with pyarrow 10.0.1 on MacOS...