cassava
cassava copied to clipboard
Use pipes for the Streaming module
I know it's another dependency but:
https://twitter.com/bitemyapp/status/531617919181258752
Could hide the pipes machinery behind the Foldable instance.
What do you think?
@tibbe does this sound even remotely like a good idea? I'll drop it and close the ticket if you would be able to explain why it's not a good idea. Currently it seems like the Streaming module is using a list with a nil product, which can be improved upon for the purposes of streaming I think.
I'm on vacation. Will take a look when I get back. On Nov 25, 2014 3:10 AM, "Chris Allen" [email protected] wrote:
@tibbe https://github.com/tibbe does this sound even remotely like a good idea? I'll drop it and close the ticket if you would be able to explain why it's not a good idea. Currently it seems like the Streaming module is using a list with a nil product, which can be improved upon for the purposes of streaming I think.
— Reply to this email directly or view it on GitHub https://github.com/tibbe/cassava/issues/69#issuecomment-64255740.
I think pipes is a too heavy dependency and I don't want to commit to one of the many competing streaming solutions out there until the community settles on something. I am curious why the streaming interface uses more memory than pipes though. Do you have some test code around?
@tibbe I'm not surprised at all, since Pipes goes out of its way to enforce that only one piece of data is processed at a time, whereas you've got what amounts to a lazy list AFAICT.
Example code here: https://github.com/bitemyapp/csvtest
Not forcing more dependencies makes sense, but the memory usage should be lower and the data type weirds me out.
Specifically: https://github.com/bitemyapp/csvtest/commit/dd458670592a6b661737bfdb05e046f1b9b7b93b
@tibbe is pipes that heavy of a dependency?
It's base (==4.*), mmorph (>=1.0.0 && <1.1), mtl (>=2.1 && <2.3), transformers (>=0.2.0.0 && <0.5)
I don't really know of anything slimmer than that. Main problem I can think of would be transformers possibly being annoying to install in some cases.
A better idea could be to mimic the work Snoyman has done with WAI: http://www.yesodweb.com/blog/2014/04/disemboweling-wai
I'd argue streaming abstraction agnosticism, rather than dep weight, is the compelling reason not to use pipes.
The real motivator is having a streaming module that works properly and doesn't keep more in memory than is necessary, throughput optimization notwithstanding.
The main issue is that I don't want to pick one out of N competing streaming solutions. It won't be long before people want the alternatives. On top of that, there's no reason why pipes would offer any better memory usage than what we could offer using the streaming or incremental APIs. I will look into the discrepancy you reported when I find some time. It might just be due to how we parse several records in a batch (and thus hold on to them in memory).
Let's reopen this as the underlying issue seems still relevant
@hvr thank you. I think a Foldable interface is fine for the purposes of the average Cassava user, I'd just like to understand the difference in memory usage.