carboxyl icon indicating copy to clipboard operation
carboxyl copied to clipboard

Excessive cloning in Stream::fold

Open wolfiestyle opened this issue 7 years ago • 8 comments

Tried to implement something that accumulates all the stream values into a Vec using Stream::fold, but found that it clones the accumulator values multiple times during it's operation. Example code:

extern crate carboxyl;
use carboxyl::Sink;
use std::fmt::Debug;

#[derive(Debug)]
struct Storage<T>
{
    vec: Vec<T>,
}

impl<T> Storage<T>
{
    fn new() -> Self
    {
        Storage{ vec: Vec::new() }
    }

    fn push(mut self, item: T) -> Self
    {
        self.vec.push(item);
        self
    }
}

impl<T: Clone + Debug> Clone for Storage<T>
{
    fn clone(&self) -> Self
    {
        println!("storage cloned! {:?}", self.vec);
        Storage{ vec: self.vec.clone() }
    }
}

fn main()
{
    let sink = Sink::new();
    let signal = sink.stream().fold(Storage::new(), Storage::push);

    sink.send(11);
    sink.send(22);
    sink.send(33);

    println!("result: {:?}", signal.sample());
}

output:

storage cloned! []
storage cloned! []
storage cloned! [11]
storage cloned! [11]
storage cloned! [11]
storage cloned! [11, 22]
storage cloned! [11, 22]
storage cloned! [11, 22]
storage cloned! [11, 22, 33]
storage cloned! [11, 22, 33]
storage cloned! [11, 22, 33]
result: Storage { vec: [11, 22, 33] }

Don't know if I'm using it wrong, but this seems pretty inefficient. A fold operation shouldn't require cloning the accumulator.

wolfiestyle avatar Feb 07 '17 00:02 wolfiestyle

could remove one clone here: https://github.com/darkstalker/carboxyl/commit/d18ed3abdc79c9081c7acfe0a32bb68f9cb3c843

But it seems that the problem comes from sampling (used in the implementation of fold). That causes extra clones that can't be easily avoided.

wolfiestyle avatar Feb 11 '17 23:02 wolfiestyle

@aepsil0n do you see a usecase where the accumulator is a costly object to clone, or should that be avoided anyways?

Moredread avatar Feb 13 '17 11:02 Moredread

First of all, thanks for looking into this @darkstalker. I would definitely accept your commit as a pull request. It would be nice to add a minimal test case along the lines of your example to avoid future regressions.

The implementation of fold is purely semantic, I have not put any effort to optimize it. That goes for a large part of the functionality. In this particular case my prime suspect would be this line, which would be hard to get rid of. But I'd have to debug this to know for sure.

It is certainly necessary to clone once at the end for sampling. The rest might be avoidable in theory. As it is, carboxyl is not provide a zero-cost abstraction for event handling. Reference counting, cloning, dynamic dispatch, etc. are still an issue. I am just not sure, how easy this is to fix without a complete rewrite.

@Moredread you are right to some extent. There is something to be said about data structures. Functional programming in the style promoted by this library works best with cheaply clonable purely functional data structures (see Okazaki, 1996). However, it would be nice not to be wasteful. ;)

milibopp avatar Feb 13 '17 17:02 milibopp

made the pull request, but couldn't figure out a reliable way to test it

wolfiestyle avatar Feb 14 '17 20:02 wolfiestyle

fine by me for now… I'll leave this issue open nonetheless, also because there is more to it.

milibopp avatar Feb 15 '17 12:02 milibopp

Did some research about this, and I think I found a solution: passing the value around as Cow<T> and decide at runtime when to pass the value as ref or owned. This way we can clone only when it's really needed. But it causes significant changes to the API. More details here:

https://play.rust-lang.org/?gist=7eee873a19bdbf32122843f9e0d4f133&version=stable&backtrace=0

wolfiestyle avatar Mar 03 '17 23:03 wolfiestyle

Interesting… this might take some work to update the code accordingly.

milibopp avatar Mar 06 '17 20:03 milibopp

Made a repository with all my work here: https://github.com/darkstalker/frappe It has most of the carboxyl functionality except threading (don't need it). I'll probably publish this as an alternative, since it has different goals.

wolfiestyle avatar Mar 08 '17 00:03 wolfiestyle