Epic: Data Explorer support for Arrow/pyarrow in Python
To be filled out
@wesm - do you see this as roughly equal priority weight to supporting Pandas/Polars? Or is there already some compatibility since Pandas/Polars can also use Arrow data? Trying to figure out whether this is a Public Beta or release candidate target.
Currently have it for RC, but we can pull back if it is higher priority.
I was thinking I would tackle this roughly at the same time as Polars, but I think it's lower priority in general
Not sure if this is related, but it doesn't seem like sorting works on polars dataframes in the data viewer.
Not sure if this is related, but it doesn't seem like sorting works on polars dataframes in the data viewer.
Hmm polars should be working for all features so far as of https://github.com/posit-dev/positron/issues/2185...
Do you mind opening a new issue or discussion if you can't get polars working on latest Positron build?
import polars as pl
df = pl.DataFrame(
{
'Model': ['iPhone X','iPhone XS','iPhone 12',
'iPhone 13','Samsung S11','Samsung S12',
'Mi A1','Mi A2'],
'Sales': [80,170,130,205,400,30,14,8],
'Company': ['Apple','Apple','Apple','Apple',
'Samsung','Samsung','Xiao Mi','Xiao Mi'],
}
)
df
There might be a data-dependent bug -- if you can post a minimal reproduction of a dataset that won't sort or at least post the log from the Console / kernel log since there is likely a bug / exception that is being raised
This still seems to be intermittent, but here's the best I can do for reproducibility:
The dataframe is:
df = pl.DataFrame({
"int_column": [1, 2, 3, 4, 5], # Int64 column
"string_column": ["a", "b", "c", "d", "e"] # String column
})
| positron | python | polars | sorts? |
|---|---|---|---|
| 2024.07.0 | 3.12 | 1.2.1 | No |
| 2024.07.0 | 3.11 | 1.3.0 | Yes |
@gshotwell I wasn't able to produce this on recent versions of Positron, so if you see it recur please let me know!
@wesm I haven't seen it either. It's possible that what was happening is that I had a sort on another column and thought that sorting by a second column replaced the first sort rather than adding a second sort.
I moved this to the Future milestone -- we can wait and see if there is user demand for this. As a workaround, you can wrap Arrow data in a polars table and view it that way. This is one possible way that we could implement viewing in the event that both pyarrow and polars are installed, but if just pyarrow is installed (and not polars or duckdb), we'll have to do a little legwork to be able to efficiently execute the data explorer queries using pyarrow's built-in Acero analytics backend.