streaming icon indicating copy to clipboard operation
streaming copied to clipboard

Just some thoughts

Open andrewthad opened this issue 9 years ago • 6 comments

This isn't really a suggestion or a feature request. I just wanted to share some of my thoughts.

I like streaming. Unless I need the bidirectional features of pipes (which happens sometimes but not often), streaming works better for me. To me, the api it provides for things like chunking and parsing is more usable. I have two libraries, siphon and lmdb-high-level, where I currently use pipes but I would rather use streaming. This would especially be an improvement in siphon, because it uses a Pipe to decode a CSV, which means that if upstream terminates and only half of a line was provided, that last line is silently discarded. I know that technically I should pipes-parse for this kind of thing, but I sort of feel like once I start using pipes-parse, I'm not using pipes anymore. Anyway, I'd rather use streaming because it would handle that case correctly without layering on another abstraction.

The thing that prevents me from bringing streaming in as a dependency in libraries I write is that I prefer a minimal set of transitive dependencies. As it stands, streaming has a number of transitive dependencies that pipes does not. I think that the only thing time is used for is to implement seconds, and that could conceivably be moved to streaming-utils. The use of ghc-prim is similarly small, but I don't care about it as much, since base depends on ghc-prim. The tricky ones are exceptions, monad-control, and resourcet. I'm guessing that having instances of the classes from these is important or useful for you, so I don't want to suggest that you remove them. But, if you're ever considering it, just know that there's someone who be glad to see fewer deps.

andrewthad avatar Oct 06 '16 21:10 andrewthad

Yes I will think about this. seconds is more a sort of demo so it's not important; on the other hand time is a boot package so it doesn't really add much, since at least in the standard setup everyone already has it. (It is the same with ghc-prim base and transformers, but not of mtl). mmorph you don't mention but it is pretty fundamental. In the end I think the trouble is coming down to resourcet which carries monad-control and exceptions with it; these in turn require stm and lifted-base. I decided these were tolerable since they are actually quite small packages that compile quickly. Note that the dependencies are the same as for the minimal conduit package, as it is now arranged, with conduit-extra and conduit-combinators etc. in separate packages.

But in fact resourcet is I think only needed within the package for readFileLn and writeFileLn which is pretty irritating. I have considered a separate package which would re-exports most of Streaming.Prelude (I don't know what to call it) which one could use instead of it, and which would perhaps export text based line IO of the sort you find in say turtle (I am everywhere trying to stay in the sort of universe Gabriel G. generates, as will be obvious - using resourcet instead of pipes-safe is one exception) So maybe a separate package with a Streaming.Prelude.Text or (Streaming.Standard or Streaming.IO or whatever it should be called) would be more sensible and would be a good way to have more extensive dependencies, particularly text and resourcet. If I wanted to use resourcet there, though, I would need an instance, and this would require an orphan instance package. The streaming-bytestring package also uses resourcet for file operation, and there would need to homogeneity in the available instances. Another consideration is that I am kind of ambivalent about resourcet anyway, but would very much like a convenient way of writing versions of writeFile and readFile which need some device like this. So I'm in a bit of a quandary. What do you think of an idea like that?

michaelt avatar Oct 07 '16 00:10 michaelt

I like mmorph as well. It is, as you say, fundamental.

Being able to open files (or any kind of resource) inside of a Stream is not something that I ever really need, so it is difficult for me to weigh in the tradeoffs around ResourceT. I don't like orphan instances, so if you were going to have to provide an orphan MonadResource instance for streaming-bytestring, I would rather it just be in streaming itself.

One thing that is a little weird is that the MonadResource instance for Streams doesn't provide a way to run finalizers early (nor is it really possible to do this in a general way). So, if I write:

runResourceT $ S.writeFile "result.txt" $ do
  S.take 10 $ S.readFile "a"
  S.take 10 $ S.readFile "b"
  S.take 10 $ S.readFile "c"
  ...

None of the file handles can be closed until all of them have used. If you are completely exhausting all of them, then it behaves very nicely though. I don't really have a complaint about this because it doesn't really seem like there's a very general good way to handle this. But I mostly bring it up to point out that streaming and resource allocation/releasing are non-trivial to combine.

Another consideration is that I am kind of ambivalent about resourcet anyway

If you are ambivalent enough, there's always the option rolling Gabriel's original resource allocation strategy into streaming and requiring more monomorphism:

newtype Resource a = Resource { acquire :: IO (a, IO ()) }

instance MonadIO Resource ...

runResource :: Resource a -> (a -> IO ()) -> IO ()
runResource resource k = bracket (acquire resource)
                                 (\(_, release) -> release)
                                 (\(a, _) -> k a)

readFile :: FilePath -> Stream (Of String) Resource ()
readFile path = error "write me"

catchStream :: Stream f IO a -> (e -> Stream f IO a) -> Stream f IO a
catchStream s k = error "write me"

But, if it's important to support more complicated monad transformer stacks inside the stream, then this is obviously no good.

andrewthad avatar Oct 07 '16 17:10 andrewthad

I just realized that Gabriel had already packaged up a variant of Resource into the managed library, which basically does what resourcet does except that it's much less complicated. Like resourcet, it also provides an mtl style typeclass.

andrewthad avatar Oct 10 '16 13:10 andrewthad

With managed, you get this:

monomorphicReadFile :: FilePath -> Stream (Of String) Managed ()
monomorphicReadFile fp =
  lift (managed (withFile fp ReadMode)) >>= S.fromHandle

mtlReadFile :: MonadManaged m => FilePath -> Stream (Of String) m ()
mtlReadFile fp =
  lift (using $ managed (withFile fp ReadMode)) >>= S.fromHandle

The thing this misses out on is prompt finalization. It will not release the file handle as soon as it could.

andrewthad avatar Oct 10 '16 14:10 andrewthad

Right, I considered using Managed and something like monomorphicReadFile before going for ResourceT. It does have the main thing I was looking for, if I remember, which is just to permit easy experimentation in ghci without having open up a withFile x ReadMode $ \h -> ... . It is certainly a light weight dependency! Some ways of ordering a heap of monad transformers would be ruled out, but maybe it's not so bad. It doesn't bother me much that the management is a little crude - we are talking about operations for string IO after all - I will try to think if there is some mishap that would be inevitable in the use cases I am thinking of.

Another consideration favoring resourcet I forgot to mention is that I had been considering a sort of conduit-groups when I cooked this up, with operations corresponding to pipes-group. This turned out to be more of a menace to implement than I anticipated. I don't think I ever worked out how much of a menace it would be not to have a ResourceT instance. In mentioning orphan instances above, I forgot to mention the parallel with pipes-safe, which has several orphan instances for Proxy. I wanted to avoid some of this decentralization, but maybe it's not that important. So anyway I'm still fretting.

michaelt avatar Oct 10 '16 23:10 michaelt

I don't mind the lack of prompt finalization either, and yeah, it does let you avoid the withFile annoyance.

(As a side of note, in the docs for writeFile, where you have written

The handle is crudely managed with ResourceT

I had not fully understood that statement until now. It might be worth adding a clarifying example to the docs to explain this a little further. I'll bring this up on a separate issue.)

Does conduit-groups exist? It's not on hackage. I would be interested in looking into what advantages ResourceT may have there, but since I don't know what you're referring to, I cannot.

I also thought about the orphan MonadCatch instance in pipes-safe. I've never needed it, but to be honest I've never needed anything in pipes-safe. I do dislike orphan instances because it's difficult to discover their provenance (or to even learn about their existence in the first place).

Something else worth considering is that regardless of whether resourcet or managed is used, mtl-style typeclasses from both of them aren't really necessary. What I mean to say is that streaming could just depend on neither of them, which would cost you the ability to write a good readFile. However, if you're just using it in ghci, you could still have a readFile :: MonadIO m => FilePath -> Stream (Of String) m () that just doesn't close the file handle if you don't exhaust it. But, streaming-bytestring could depend still on managed (or resourcet), which would let you have readFile there. And you don't even have to add the orphan instances or anything because they aren't needed. I've never seen anyone layer another monad transformer on top of Stream or Proxy, so I am don't know if these instances end up being of any practical value, orphan or not.

EDIT

And you could always kick the can further down the road on committing to a resource-management library and just put not-file-handle-closing versions of readFile and friends in streaming-bytestring as well.

andrewthad avatar Oct 11 '16 01:10 andrewthad