conduit icon indicating copy to clipboard operation
conduit copied to clipboard

Subtle cleanup issue with zipConduitApp

Open twhitehead opened this issue 7 years ago • 3 comments

My understanding is that a Pipe should not generally be abandoned except on Done result or by running the final cleanup associated with a HaveOutput pipe final output.

Part of the tricky bit of the zipConduitApp combinator is combining two pipes into one in such a way that this is maintained. That is, maintaining the invariant that, if the user does not abandon the combined pipe except as a above, then neither of the two combined pipes will be abandoned except as above also.

As implemented, this is done by keeping track of the last final operation associated with each of the pipes, respectively finalX and finalY, and providing a combined final = finalX >> finalY operation for each HaveOutput generated by the combined pipe.

The problem I see with this is finalX and finalY are return () until the respective pipes generate a HaveOutput. This will cause problems for some pipes. For example, imagine the right pipe does the following

  1. opens a file (PipeM),
  2. yields some output (HaveOutput with a close the file finalizer),
  3. ...

while the left pipe does the following

  1. yields some output (HaveOutput with a do nothing finalizer),
  2. ...

The combined pipe will then

  1. opens the right right pipe's file (PipeM),
  2. yield the left pipe's output (HaveOutput with a do nothing finalizer),
  3. ...

The issue here is if the downstream pipe chooses to abort at (2) then the right pipe will not get its cleanup done. This actually extends past the initial condition being set as well. For example, it is possible to imagine a pipe doing a HaveOutput followed by a PipeM operation that acquires further resources and thus invalidates the final operation associated with the earlier HaveOutput.

The only solution I can see is that the combined pipe cannot pass along a HaveOutput (give the user an option to abort with a combined final) unless the combined pipes both have not been allowed to make further progress themselves (i.e., their last action was HaveOutput or Done).

Cheers! -Tyson

PS: I believe there are also issues with the way the Done state is handled. A combined pipe going into the Done state does not currently drop the last HaveOutput final operation associated with it.

If a user then chooses to abort on a combined Output with one of the combined pipes being in the Done state, the combined final operation will run the final associated with the Done pipe's last HaveOutput. This could leading to such things as trying to close the same file twice.

twhitehead avatar Jan 27 '17 17:01 twhitehead

Can you provide a test case demonstrating the problem?

On Fri, Jan 27, 2017 at 7:22 PM, Tyson Whitehead [email protected] wrote:

My understanding is that a Pipe http:///snoyberg/conduit/blob/be803218b5b2acaad2eb720ca3a27a4d0734fba8/conduit/Data/Conduit/Internal/Pipe.hs#L89 should not generally be abandoned except on Done result or by running the final cleanup associated with a HaveOutput pipe final output.

Part of the tricky bit of the zipConduitApp http:///snoyberg/conduit/blob/be803218b5b2acaad2eb720ca3a27a4d0734fba8/conduit/Data/Conduit/Internal/Conduit.hs#L536 combinator is combining two pipes into one in such a way that this is maintained. That is, maintaining the invariant that, if the user does not abandon the combined pipe except as a above, then neither of the two combined pipes will be abandoned except as above also.

As implemented, this is done by keeping track of the last final operation associated with each of the pipes, respectively finalX and finalY, and providing a combined final = finalX >> finalY operation for each HaveOutput generated by the combined pipe.

The problem I see with this is finalX and finalY are return () until the respective pipes generate a HaveOutput. This will cause problems for some pipes. For example, imagine the right pipe does the following

  1. opens a file (PipeM),
  2. yields some output (HaveOutput with a close the file finalizer),
  3. ...

while the left pipe does the following

  1. yields some output (HaveOutput with a do nothing finalizer),
  2. ...

The combined pipe will then

  1. opens the right right pipe's file (PipeM),
  2. yield the left pipe's output (HaveOutput with a do nothing finalizer),
  3. ...

The issue here is if the downstream pipe chooses to abort at (2) then the right pipe will not get its cleanup done. This actually extends past the initial condition being set as well. For example, it is possible to imagine a pipe doing a HaveOutput followed by a PipeM operation that acquires further resources and thus invalidates the final operation associated with the earlier HaveOutput.

The only solution I can see is that the combined pipe cannot pass along a HaveOutput (give the user an option to abort with a combined final) unless the combined pipes both have not been allowed to make further progress themselves (i.e., their last action was HaveOutput or Done).

Cheers! -Tyson

PS: I believe there are also issues with the way the Done state is handled. A combined pipe going into the Done state does not currently drop the last HaveOutput final operation associated with it.

If a user then chooses to abort on a combined Output with one of the combined pipes being in the Done state, the combined final operation will run the final associated with the Done pipe's last HaveOutput. This could leading to such things as trying to close the same file twice.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/snoyberg/conduit/issues/293, or mute the thread https://github.com/notifications/unsubscribe-auth/AADBB97OFPi1zQXdNiPkptlNH7j2HDjkks5rWifugaJpZM4LwAg_ .

snoyberg avatar Jan 28 '17 16:01 snoyberg

I was just independently studying the zipConduitApp code and noticed the confusing handling of finalX and finalY which might explain the problem:

     go _ finalY (HaveOutput x finalX o) y = HaveOutput
         (go finalX finalY x y)
         (finalX >> finalY)
         o

This executes both finalX >> finalY as the finalizer, but also passes them both on. If the next item is another HaveOutput x, then finalY will be executed again. If it's a HaveOutput y, then finalX will be. It's not completely clear to me why the finalX and finalY arguments are needed here at all. It seems like this should just be:

     go (HaveOutput x finalX o) y = HaveOutput
         (go x y)
         finalX
         o

Finalizers are never lost this way, and each one appears only once. But maybe there's some more complicated case I don't understand this is meant to handle.

dylex avatar Feb 09 '17 16:02 dylex

Sorry for not getting back to you yet. I will write up an example as you requested. Just have to meet an end-of-month deadlines first. :smile:

twhitehead avatar Feb 09 '17 17:02 twhitehead