ekelsen
ekelsen
The speed of cumsum has been improved significantly; I'm going to close this. Feel free to re-open if you feel it still isn't fast enough.
Currently just HEAD: https://github.com/tensorflow/tensorflow/commit/73e3215c3a2edadbf9111cca44ab3d5ca146c327
Do the programs under tests work?
The interface of CUB is also much nicer with respect to memory management. I second this request as well.
I would be happy with the same interface as for the segmented reductions - a pair of arrays that could be aliased to the same one.
And to throw one more thing on the pile, it would be great if there was at least the option to get deterministic results, even for pseudo-associative operations. (Maybe that...
Glad to see this happening! Although not directly affected anymore, my preference would be Approach 2.