altair icon indicating copy to clipboard operation
altair copied to clipboard

[question] Apache arrow support in altair ?

Open djouallah opened this issue 5 years ago • 6 comments

I read that apache arrow is supported now in vega-lite using a separate vega-loader, is this something that altair will support.

for anyone who used Apache arrow, is the performance way better than csv, my current project has a dataset of 60000 line and it is a bit slow, specially for brushing

djouallah avatar Mar 29 '19 22:03 djouallah

Arrow support in Vega/Vega-Lite is an open issue: https://github.com/vega/vega/issues/1300. Once it's available there, we will work to make it available in Altair

jakevdp avatar Mar 29 '19 22:03 jakevdp

Thanks, I was referring to this https://github.com/vega/vega-loader-arrow

djouallah avatar Mar 29 '19 23:03 djouallah

now that there is the vega-loader-arrow. What should support in altair look like?

doug avatar Apr 24 '20 18:04 doug

It would first have to be supported by Vega-Lite.

jakevdp avatar Apr 24 '20 19:04 jakevdp

Vega-Lite's schema does not yet allow the arrow format: https://github.com/vega/schema/blob/master/vega-lite/v4.11.0.json#L5526-L5541

jakevdp avatar Apr 24 '20 19:04 jakevdp

It looks arrow data sources can be used by vega-lite now: https://observablehq.com/@vega/vega-lite-and-apache-arrow-no-plugin. If the data is served from a url and not passed as a javascript object, it looks like the plugin mentioned above will register the format (I think).

Is it a requirement for altair that this is implemented in the main vega/ vega-lite code bases?

ivirshup avatar May 17 '21 05:05 ivirshup

@jakevdp What is the situation with Apache Arrow and Arrow Table? @domoritz Am I correct that vega-lite supports Arrow nowadays?

kimmolinna avatar Jan 19 '23 11:01 kimmolinna

I'm not sure what native arrow support altair has, but the vegafusion project integrates Apache Arrow DataFusion with altair

dhirschfeld avatar Jan 19 '23 11:01 dhirschfeld

I noticed that but I would like to use duckdb as a query engine. DuckdDB supports pandas but I prefer Apache Arrow.

kimmolinna avatar Jan 19 '23 11:01 kimmolinna

An Arrow JS Table object presents itself as an array of objects and Vega automatically supports it. This has been true for a while now. The arrow loader is only needed when you refer to an arrow file (in IPC format) by url.

domoritz avatar Jan 19 '23 13:01 domoritz

@kimmolinna You can use duckdb via vegafusion as well https://vegafusion.io/duckdb.html. I'm closing this as it seems to me like the use cases requested here are covered by vegafusion, but please comment if I'm missing something.

joelostblom avatar Jan 05 '24 20:01 joelostblom