SplitApplyCombine.jl icon indicating copy to clipboard operation
SplitApplyCombine.jl copied to clipboard

Name proposal for lazy operations: past tense

Open andyferris opened this issue 7 years ago • 7 comments
trafficstars

I've been happy to seperate the semantics of greedy-vs-lazy operations into seperate functions, like map vs mapview and group vs groupview. However, the view suffix is a little tiresome.

I'm wondering if we should follow the example set by Base.Broadcast which uses broadcast(...) = materialize(broadcasted(...)), where broadcasted is more-or-less a lazy version of broadcast and materialize is something that behaves a bit like copy when necessary.

That would be something like:

  • mapview -> mapped.
  • groupview -> grouped.
  • Lazy join functions ending in joined instead of join.
  • A new filtered for lazy filter.
  • product is a noun, not a verb, and seems fine being lazy.
  • flatten vs flattened?
  • We've been discussing splitdims at Base and slice/slices seems like a possible naming. sliced could be a lazy version?
  • etc...

Does anyone have any thoughts or opinions?

andyferris avatar Nov 03 '18 09:11 andyferris

I'm not sure whether using past tense is explicit enough. Anyway, better discuss that in the base Julia repo?

nalimilan avatar Nov 03 '18 14:11 nalimilan

i like this

bramtayl avatar Dec 06 '18 17:12 bramtayl

Ok, further thoughts: why not just make everything lazy? Eager methods could just be optimization methods of Base.collect

bramtayl avatar Dec 12 '18 02:12 bramtayl

While I may somewhat agree with you... I try to follow Base semantics here for things like map. It might be too late to start fiddling with that (when it was discussed with earlier versions of Julia I understand it was felt at the time there may be too much run-time overhead with laziness. Even if the compiler is better at allowing zero-cost abstractions these days, complexity still goes up considerably).

Also, some operations aren't obviously better being lazy. groupview is only partially lazy (full lazy would be much worse!). filtered cannot possibly preserve array-ness. It's a tricky space, I feel.

andyferris avatar Dec 12 '18 12:12 andyferris

I mean, map internally creates a generator and collects it, so... Being super-lazy opens up options for optimizations. For example, mapping a reduce function over truly lazy groups would enable the groupreduce optimization.

bramtayl avatar Dec 12 '18 22:12 bramtayl

For example, mapping a reduce function over truly lazy groups would enable the groupreduce optimization.

Unlike most operations where we can just rely on iteration and separation of concerns and still get optimal performance, for this particular case I think it would be necessary to overload the method explicitly. There's worse complications regarding anonymous functions that can't be introspected and the fact that map and broadcast do not work on AbstractDict in the first place. That is, I couldn't figure out a way to make map(g -> reduce(+, g), grouped(by, itr)) or even sum.(grouped(by, itr)) work. I raged a bit on JuliaLang/julia at the time :)

Of course, for something like mapreduce this is trivial, as reduce(op, mapview(f, itr); init = ...) already works out-of-the-box.

andyferris avatar Dec 12 '18 22:12 andyferris

I think the solution in that case is pretty easy: collect(Generator(Reduce(f), Grouped(by, iter)) would just have to optimized to groupreduce(f, by, iter). I have an implementation of Reduce in JuliennedArrays. Grouped just has to be truly lazy.

bramtayl avatar Dec 13 '18 00:12 bramtayl