arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-40205: [Python] ListView arrow-to-pandas conversion

Open danepitkin opened this issue 11 months ago • 4 comments

Rationale for this change

ListView should support converting to pandas/numpy in pyarrow.

What changes are included in this PR?

  • .to_pandas() successfully creates a pandas dataframe

Are these changes tested?

  • Yes, unit tests

Are there any user-facing changes?

Yes, supporting an existing API.

  • GitHub Issue: #40205

danepitkin avatar Mar 12 '24 02:03 danepitkin

:warning: GitHub issue #40205 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Mar 12 '24 02:03 github-actions[bot]

One of the problems with the Flatten APIs as far as I understand (and that I am not sure you tackled in the current version of the PR?), is that you would then also have to change the offsets to keep those matching the flattened values.

From a comment in ConvertListsLike:

    // We can't use Flatten(), because it removes the values behind a null list
    // value, and that makes the offsets into original list values and our
    // flattened_values array different.

So that might be a reason to keep something more similar to the logic we have in arrow_to_pandas.cc right now for ListArray

jorisvandenbossche avatar Mar 12 '24 13:03 jorisvandenbossche

I am wondering if we could get around having to concatenate the slices for ListView, and rather update the logic that is now in ConvertListsLikeChunks to have a specialized version of that for ListView (knowing that the slices we make there might be out of order)

jorisvandenbossche avatar Mar 12 '24 13:03 jorisvandenbossche

Thanks for the feedback @jorisvandenbossche ! I went ahead and reused the FromListView() APIs.

danepitkin avatar Mar 12 '24 21:03 danepitkin

The appveyor error is a known issue on main and already has a GH issue for it.

danepitkin avatar Mar 13 '24 19:03 danepitkin

Thanks @danepitkin!

jorisvandenbossche avatar Mar 26 '24 15:03 jorisvandenbossche

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 32437a5aebd6fba0abbc63dfcf8e24106c617efd.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.