cassava icon indicating copy to clipboard operation
cassava copied to clipboard

Use pipes for the Streaming module

Open bitemyapp opened this issue 10 years ago • 9 comments

I know it's another dependency but:

https://twitter.com/bitemyapp/status/531617919181258752

Could hide the pipes machinery behind the Foldable instance.

What do you think?

bitemyapp avatar Nov 18 '14 18:11 bitemyapp

@tibbe does this sound even remotely like a good idea? I'll drop it and close the ticket if you would be able to explain why it's not a good idea. Currently it seems like the Streaming module is using a list with a nil product, which can be improved upon for the purposes of streaming I think.

bitemyapp avatar Nov 24 '14 20:11 bitemyapp

I'm on vacation. Will take a look when I get back. On Nov 25, 2014 3:10 AM, "Chris Allen" [email protected] wrote:

@tibbe https://github.com/tibbe does this sound even remotely like a good idea? I'll drop it and close the ticket if you would be able to explain why it's not a good idea. Currently it seems like the Streaming module is using a list with a nil product, which can be improved upon for the purposes of streaming I think.

— Reply to this email directly or view it on GitHub https://github.com/tibbe/cassava/issues/69#issuecomment-64255740.

tibbe avatar Nov 24 '14 23:11 tibbe

I think pipes is a too heavy dependency and I don't want to commit to one of the many competing streaming solutions out there until the community settles on something. I am curious why the streaming interface uses more memory than pipes though. Do you have some test code around?

tibbe avatar Nov 30 '14 08:11 tibbe

@tibbe I'm not surprised at all, since Pipes goes out of its way to enforce that only one piece of data is processed at a time, whereas you've got what amounts to a lazy list AFAICT.

Example code here: https://github.com/bitemyapp/csvtest

Not forcing more dependencies makes sense, but the memory usage should be lower and the data type weirds me out.

bitemyapp avatar Nov 30 '14 18:11 bitemyapp

Specifically: https://github.com/bitemyapp/csvtest/commit/dd458670592a6b661737bfdb05e046f1b9b7b93b

bitemyapp avatar Nov 30 '14 22:11 bitemyapp

@tibbe is pipes that heavy of a dependency?

It's base (==4.*), mmorph (>=1.0.0 && <1.1), mtl (>=2.1 && <2.3), transformers (>=0.2.0.0 && <0.5)

I don't really know of anything slimmer than that. Main problem I can think of would be transformers possibly being annoying to install in some cases.

A better idea could be to mimic the work Snoyman has done with WAI: http://www.yesodweb.com/blog/2014/04/disemboweling-wai

I'd argue streaming abstraction agnosticism, rather than dep weight, is the compelling reason not to use pipes.

The real motivator is having a streaming module that works properly and doesn't keep more in memory than is necessary, throughput optimization notwithstanding.

bitemyapp avatar Dec 05 '14 00:12 bitemyapp

The main issue is that I don't want to pick one out of N competing streaming solutions. It won't be long before people want the alternatives. On top of that, there's no reason why pipes would offer any better memory usage than what we could offer using the streaming or incremental APIs. I will look into the discrepancy you reported when I find some time. It might just be due to how we parse several records in a batch (and thus hold on to them in memory).

tibbe avatar Dec 05 '14 09:12 tibbe

Let's reopen this as the underlying issue seems still relevant

hvr avatar Nov 03 '15 06:11 hvr

@hvr thank you. I think a Foldable interface is fine for the purposes of the average Cassava user, I'd just like to understand the difference in memory usage.

bitemyapp avatar Nov 03 '15 06:11 bitemyapp