positron icon indicating copy to clipboard operation
positron copied to clipboard

Epic: Data Explorer support for Arrow/pyarrow in Python

Open wesm opened this issue 2 years ago • 8 comments

To be filled out

wesm avatar Feb 01 '24 16:02 wesm

@wesm - do you see this as roughly equal priority weight to supporting Pandas/Polars? Or is there already some compatibility since Pandas/Polars can also use Arrow data? Trying to figure out whether this is a Public Beta or release candidate target.

Currently have it for RC, but we can pull back if it is higher priority.

jthomasmock avatar Feb 20 '24 14:02 jthomasmock

I was thinking I would tackle this roughly at the same time as Polars, but I think it's lower priority in general

wesm avatar Feb 20 '24 18:02 wesm

Not sure if this is related, but it doesn't seem like sorting works on polars dataframes in the data viewer.

gshotwell avatar Jul 31 '24 22:07 gshotwell

Not sure if this is related, but it doesn't seem like sorting works on polars dataframes in the data viewer.

Hmm polars should be working for all features so far as of https://github.com/posit-dev/positron/issues/2185...

Do you mind opening a new issue or discussion if you can't get polars working on latest Positron build?

image

import polars as pl
df = pl.DataFrame(
     {
         'Model': ['iPhone X','iPhone XS','iPhone 12',
                   'iPhone 13','Samsung S11','Samsung S12',
                   'Mi A1','Mi A2'],
         'Sales': [80,170,130,205,400,30,14,8],     
         'Company': ['Apple','Apple','Apple','Apple',
                     'Samsung','Samsung','Xiao Mi','Xiao Mi'],
     }
)
df

jthomasmock avatar Jul 31 '24 23:07 jthomasmock

There might be a data-dependent bug -- if you can post a minimal reproduction of a dataset that won't sort or at least post the log from the Console / kernel log since there is likely a bug / exception that is being raised

wesm avatar Aug 01 '24 19:08 wesm

This still seems to be intermittent, but here's the best I can do for reproducibility:

The dataframe is:

df = pl.DataFrame({
     "int_column": [1, 2, 3, 4, 5],       # Int64 column
     "string_column": ["a", "b", "c", "d", "e"]  # String column
})
positron python polars sorts?
2024.07.0 3.12 1.2.1 No
2024.07.0 3.11 1.3.0 Yes

gshotwell avatar Aug 14 '24 13:08 gshotwell

@gshotwell I wasn't able to produce this on recent versions of Positron, so if you see it recur please let me know!

wesm avatar Oct 10 '24 18:10 wesm

@wesm I haven't seen it either. It's possible that what was happening is that I had a sort on another column and thought that sorting by a second column replaced the first sort rather than adding a second sort.

gshotwell avatar Oct 10 '24 18:10 gshotwell

I moved this to the Future milestone -- we can wait and see if there is user demand for this. As a workaround, you can wrap Arrow data in a polars table and view it that way. This is one possible way that we could implement viewing in the event that both pyarrow and polars are installed, but if just pyarrow is installed (and not polars or duckdb), we'll have to do a little legwork to be able to efficiently execute the data explorer queries using pyarrow's built-in Acero analytics backend.

wesm avatar Dec 16 '24 18:12 wesm