plotly.py icon indicating copy to clipboard operation
plotly.py copied to clipboard

Add support for pyarrow Table and/or polars DataFrame

Open sa- opened this issue 2 years ago • 3 comments

Apache Arrow is slowly becoming the new standard for dataframes, and there is a dataframe library written on top of Arrow called Polars https://github.com/pola-rs/polars and it's really fast

It would be nice if there was support for polars directly, or for pyarrow tables so that I could use plotly with it like one would with pandas.

For example, it would be nice if I could do this:

import polars as pl
import plotly.express as px

df = pl.DataFrame({"a":[1,2,3,4,5], "b":[1,4,9,16,25]})

px.line(df, x="a", y="b")
# or px.line(df.to_arrow(), x="a", y="b")
# if you would only like to provide support for pyarrow Tables and not polars specifically

sa- avatar Mar 19 '22 09:03 sa-

The workaround that I use right now is px.line(x=df["a"], y=df["b"]), but it gets unwieldy if the name of the data frame is too big

sa- avatar Mar 22 '22 07:03 sa-

Surprisingly, polars Series seem to work out of the box, as you write in your workaround. I am curious how this is possible.

DrMaphuse avatar Sep 13 '22 09:09 DrMaphuse

As far as I know, Plotly Express doesn't use any pandas functions which are significantly faster in polars or other data frames like vaex. All that PX does is column extraction and melt() if you pass in wide-form data. PX doesn't do any aggregations or math on the dataset. At the end of the process, some part of every data-frame row that you pass in to PX gets serialized to JSON anyway, so you can't really us PX to visualize very-large datasets.

So you in general you should convert your data frames to pandas ones first before passing them to Plotly Express. The most straightforward way for PX to "accept" such alternative dataframes would be for PX to detect the presence of an "export to pandas" function and call that internally.

nicolaskruchten avatar Sep 13 '22 14:09 nicolaskruchten

Just to be explicit: You can use the graph_objects API to plot polars series. What doesn't work is passing a df to e.g. px.line(df, x=..., y=...), and then referencing x and y by strings.

from plotly import graph_objects as go
import polars as pl
dates = pl.date_range(low=date(2021, 1, 1), high=date(2021, 1, 5), interval='1d', name="dates")
df = pl.DataFrame({"dates": dates, "values": range(5)})

fig = go.Figure()
fig.add_trace(go.Scatter(x = df["dates"], y=df["values"]))
st.plotly_chart(fig)

image

This works fine in streamlit, for those who are interested.

thomasaarholt avatar Oct 26 '22 14:10 thomasaarholt

Surprisingly, polars Series seem to work out of the box, as you write in your workaround. I am curious how this is possible.

I expect it's because we support the numpy __array__ protocol.

Note that VegaFusion/Altair recently gained polars support (and vaex/duckdb) by implementing the DataFrame Interchange Protocol; this would be a nice/generic way forward here too (rather than having to add custom/per-backend support). There does seem to be an existing PR for this; if that was merged then everything should "just work", which would be awesome.

alexander-beedie avatar May 01 '23 08:05 alexander-beedie

Latest release should support this 🥳 https://github.com/plotly/plotly.py/releases/tag/v5.15.0

But it's still using Pandas under-the-hood 😢

px methods now accept data-frame-like objects that support a to_pandas() method, such as polars, cudf, vaex etc

Lundez avatar Jun 09 '23 11:06 Lundez

Nice! I guess that was the cheapest / fastest way to getting support.

thomasaarholt avatar Jun 09 '23 11:06 thomasaarholt

Hi - we are tidying up stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if it is still a concern, please add a comment letting us know what recent version of our software you've checked it with so that I can reopen it and add it to our backlog. If you'd like to submit a PR, we'd be happy to prioritize a review, and if it's a request for tech support, please post in our community forum. Thank you - @gvwilson

gvwilson avatar Jul 11 '24 14:07 gvwilson