courier
courier copied to clipboard
Feature: Flush Transport?
While playing with courier, I learned that unbinding the endpoint, shutting down the transport and exiting, results in potential "final / last" messages getting lost, as the following snipped shows. If a flush
method on the transport would just ensure that the transport had no waiting outgoing messages, and flush
blocking until they are all sent, that would be a great addition, I think. I could also imagine shutdown transport
to block until all outgoing messages are sent.
module Main where
import System.Environment (getArgs)
import Network.Endpoints
import Network.Transport.TCP
import Data.Serialize
import Control.Concurrent (threadDelay)
main :: IO ()
main = do
let master = "master"
slave = "slave"
resolver = resolverFromList [(master, "localhost:2001"),
(slave, "localhost:2000")]
transport <- newTCPTransport resolver
args <- getArgs
case args of
["master"] -> do
endpoint <- newEndpoint [transport]
Right () <- bindEndpoint endpoint master
sendMessage_ endpoint slave $ encode "Hi!"
sendMessage_ endpoint slave $ encode "exit"
-- this receive is here, to ensure the client got
-- the message. And gets a chance to respond.
-- If it wasn't here. The client would neither receive
-- "Hi!" nor "exit" as we are presumably exiting
-- before the messages left the transport.
msg <- receiveMessage endpoint
let Right txt = decode msg
in print $ "[Master] recv: " ++ txt
Right () <- unbindEndpoint endpoint master
shutdown transport
["slave"] -> do
endpoint <- newEndpoint [transport]
Right () <- bindEndpoint endpoint slave
runSlave 0 endpoint $ do { sendMessage_ endpoint master $ encode "And gone..."
-- similarly to the above note, we need to delay for a little to
-- give the transport time to actually push the message, before
-- exiting and thus never delivering it.
; threadDelay 1000
; unbindEndpoint endpoint slave
; shutdown transport }
where
runSlave :: Int -> Endpoint -> IO () -> IO ()
runSlave n endpoint term = do
msg <- receiveMessage endpoint
case decode msg of
Right "exit" -> term
Right txt -> do
print $ "[Slave] recv(" ++ (show n) ++ "): " ++ txt
runSlave (n + 1) endpoint term
Well, flush isn't a generic feature of all transports. For example, its possible to have the same scenario you describe with just a TCP/IP socket: close the socket before all data sent, messages get lost.
Part of the point of the courier design was to force applications to think through these types of issues: what guarantees, confirmations, timeouts, etc., are necessary for the application to succeed. In any application at scale, most of the guarantees we assume exist may not in fact hold--thus, as application writers we need to account for these things.
For example, TCP/IP does not actually guarantee delivery, just because data has been sent: it only guarantees that the bytes that are delivered are delivered in the order sent, with no gaps. Application writers have to think through how to guarantee that data has been received, if it's important, or how to handle cases where it may not be.
Having said all that, keeping the messaging interface simple, at the level of basic send / receive, means that one can still write simple algorithms and ultimately rely upon composition to combine separated concerns into an application.
Not saying I couldn't be convinced (or we couldn't think through to a better general design), but for now I don't think such a feature would be appropriate.
I fully agree with your reasoning. It looks to me like we are concerned with two different layers of the issue. Maybe we could call it tryFlush
to handle the case of a closed socket?
What I'd just like to be able to do is, to give the async messengers an option to tell me that they tried to deliver all of their messages in their queue, independent of successful delivery. As it stands now, if I try to send a message as a final "good bye", prior to exiting the application, that message will in no case be delivered, independent of transport issues. As a user I have no option to check if all outgoing messages were at least tried to be written to the socket.
I'm more concerned with a clean exit. How as a user of the library can I ask if all messages I enqueued for delivery have been tried to send (not necessarily delivered!)? If the socket gets closed prior to sending all messages that would still be ok. After all, we tried to send the message.
As it sands right now, I have to add an artificial delay, which might or might not be enough depending on network performance.
Alternatively call it a gracefulShutdown
of the transport
? That blocks until all messengers have tried to clear their outbound queues? As I understand, haskell kills all spawned of threads once the main thread exits and does not wait for them.
PS: I did not intend to close the issue, hence reopened.
Ah! I see where you are going.
To use my own analogy, at least with TCP/IP, any data given to the OS will continue to be transmitted, even if a socket close comes along. It's all about ordering of events: the close comes after some send calls, so send calls made before the close will (as best as possible) will be completed.
If we were to add something like this, the downside is that an application could queue up thousands of messages....and then to fit the ordering semantics, all of these messages would have to be sent before shutting down the transport.
The interesting thing is, I don't think we are debating anything other than the design of better transports for courier—which is exactly what I wanted: applications should to be able to supply transports that work for their purposes, while keeping their algorithms nicely isolated behind the endpoint abstraction.
This leaves me with thinking that rather than add a "flush transport" type of method, transport shutdown should respect the ordering of operations requested by the application (at least best effort). That is general purpose, and doesn't artifically mask the real world: it just gets courier out of the way.
You've got me thinking: I'm pretty sure there is an improvement in here, we just have to find the right one to implement.
And BTW, I forgot to say this earlie: thanks for your interest in courier! :)
At least for the socket-based transports that courier provides, this may all boil down to how messengers are closed.
Right now, the code explicitly cancels the sender / receiver asyncs--even though it sets a TVar Bool to false, to also signal that we're shutting down.
On the sending side, the sender simply waits for messages to appear in its mailbox, quitting when the done TVar is set to true--without regard for whether the mailbox is empty.
So, fix the behavior of the sender, fix the behavior of closeMessenger, fix the behavior of the transport.
Huh, lots to think about!
Yes. It's somewhere in there. The note about the high number of messages could be handled by the library user through shutdown
and gracefulShutdown
no?
As in:
-
shutdown
: I don't care. Just stop the transport. We're done here! -
gracefulShutdown
: Please block until any remaining messages, if possible, have been pushed. No guarantee that they are deliver, but we made sure the messengers tried their best to put the messenges on the delivery truck.