DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

add `Reduce` to the minilanguage

Open bkamins opened this issue 2 years ago • 2 comments

For interoperability with DTable we have discussed with @krynju that it would be nice to add Reduce wrapper to the mini-language.

The working idea is that src => Reduce(f) => dst would be in DataFrames.jl re-written to src => (x -> reduce(f, x)) => dst (in the simplest case, we need to handle all src types and handle init additionally). DTable of course would have a different internal mechanizm, but the point is that DTable needs such a wrapper for performance.

@nalimilan - do you have any comments/opinion about this before I start implementing it?

bkamins avatar Apr 08 '22 16:04 bkamins

Go for it! Actually I almost wanted to do that it when we first introduced these internal aggregation fast-paths, so it should fit quite well in the design. I guess Reduce would live in DataAPI?

Ah maybe one comment: mean and std need passing a special adjustment function, which is probably worth standardizing with DTable. Maybe as a keyword argument? It could also fit in the JuliaFolds API?

nalimilan avatar Apr 08 '22 16:04 nalimilan

@krynju - can you please comment what API you need from the DTable perspective (taking JuliaFolds into account if possible), as I guess your requirements are more binding than our here?

bkamins avatar Apr 08 '22 16:04 bkamins