arrow
arrow copied to clipboard
GH-40205: [Python] ListView arrow-to-pandas conversion
Rationale for this change
ListView should support converting to pandas/numpy in pyarrow.
What changes are included in this PR?
-
.to_pandas()
successfully creates a pandas dataframe
Are these changes tested?
- Yes, unit tests
Are there any user-facing changes?
Yes, supporting an existing API.
- GitHub Issue: #40205
:warning: GitHub issue #40205 has been automatically assigned in GitHub to PR creator.
One of the problems with the Flatten
APIs as far as I understand (and that I am not sure you tackled in the current version of the PR?), is that you would then also have to change the offsets to keep those matching the flattened values.
From a comment in ConvertListsLike
:
// We can't use Flatten(), because it removes the values behind a null list
// value, and that makes the offsets into original list values and our
// flattened_values array different.
So that might be a reason to keep something more similar to the logic we have in arrow_to_pandas.cc right now for ListArray
I am wondering if we could get around having to concatenate the slices for ListView, and rather update the logic that is now in ConvertListsLikeChunks
to have a specialized version of that for ListView (knowing that the slices we make there might be out of order)
Thanks for the feedback @jorisvandenbossche ! I went ahead and reused the FromListView()
APIs.
The appveyor error is a known issue on main and already has a GH issue for it.
Thanks @danepitkin!
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 32437a5aebd6fba0abbc63dfcf8e24106c617efd.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.