Support data transformations
What I really miss in the Arrow JS lib, that I have to write row based accumulators or lookups in JS to achieve synthetic accumulators (sum, avg, cumsum). As DataFusion and Polars already support this, I assume it's available on the base Arrow (Rust) too. Could you add a few examples in this area (sort, groupby, sum).
p.s. While I understand that Datafusion capabilities (SQL engine) could be cumbersome and overkill, did you consider exposing @ritchie46's lazy Polars API?
I added sum, min, and max to vectors already and it would definitely be great to support more aggregates. Which ones would you want specifically (and are they supported in the rust library)?
I looked into datafusion but it doesn't compile to wasm right now. I filed a Jira ticket already. https://issues.apache.org/jira/plugins/servlet/mobile#issue/ARROW-11615
I had not seen polars before. Thanks for the pointer.
I saw vec.sum(), but what I meant was table.groupby("date").column(["temp", "rain"]).sum() or table.groupby([1]).column([2,3]).sum().
What I need in SQL I would describe as:
SELECT date, SUM(temp), SUM(rain) FROM myTable GROUP BY date ORDER BY date DESC given a myTable available in Arrow IPC format.
This would make arrow-wasm one of the most powerful dataframe libs in JS instantly.
That would be great and I am thinking about how to best support it (and more). The rust arrow library doesn't have groupby, though (or I didn't see it): https://docs.rs/arrow/3.0.0/arrow/.
Oh, then it's the higher level libs (DataFusion, Polars) contribution, my mistake :) I still really love this initiative, I'm happy to see the data science field sharing more and more code across languages.
I updated the title to be more generic.