network Problems with error reporting/handling

Hi all,

I'd like to discuss and hopefully improve error reporting/handling in network library.

The problem is this:

Wrapper functions(connect, bind, send etc.) hides errors that are normally reported by native versions using return codes and errno.

For example, native connect function reports errors using -1 return value and then users can learn about specifics using errno. However, connect bindings of network package hides all errors from users. There's no way to safely catch errors returned by errno when timeout happens, or when there is a problem with the connection etc.

This is because:

Bindings/wrappers in network uses IO () which doesn't say anything helpful about possible failures.
Documentation doesn't mention errors.
We can't extend IOException(or IOErrorType of GHC) type with new constructors for exceptions specific to those functions.

So in a sense, Haskell bindings/wrappers are even worse than native versions in the sense that it's possible to write safe applications using native versions of network functions while that's not possible in Haskell versions.

This is really awful.

What's even worse is that the library actually allows us to set socket properties like SO_RCVTIMEO and SO_SNDTIMEO. So we can set timeouts, but can't handle them.

As a concrete example, let's see this scenario:

I have an application that creates lots of sockets, both TCP and UDP. I want to handle all networking errors, either log them or handle them properly.

While using connect, I want to do something when it fails because of a timeout. However, if it fails because of a connection error, then I want to log it(or show it to the user).

To do this I have to dive into network source code, try to figure out what IOExceptions are thrown when those errors happen, hope that no same IOExceptions are thrown for different types of errors, and then pattern match in my program for that IOExceptions.

This is obviously horrible and unsafe. (and cannot work for some cases, e.g. what happens if network uses same errors for two different errors reported by errno?)

In pseudo-code:


import GHC.IO.Exception
import System.IO.Error

myFun = flip catchIOError errHandler $ do
    ...
    putStrLn "connecting..."
    connect sock addr
    putStrLn "connected."
    ...
  where
    errHandler :: IOError -> IO ()
    errHandler IOError{ioe_type=NoSuchThing} = putStrLn "Problems with connection."
    errHandler IOError{ioe_type=TimeExpired} = putStrLn "Timeout happened."
    errHandler err = putStrLn $ "Unhandled error: " ++ show err

About #62: I think we shouldn't rely on changes in base to be able to report errors to users properly. We can handle this in network without any extra support from base and I think this is right way to go.

So I suggest having types like this:

connect :: Socket -> SockAddr -> IO (Maybe ErrNo)

where ErrNo is errno constants.

I think this is only safe way to go, because that's the type of errno and we can't make the type more specific without making assumptions about error codes returned by network functions.

So, what do you think? Can we improve the library by addressing those for the next version? I'm willing to help with coding part if you like the idea.

Nov 11 '14 16:11 osa1

I'm leaving on vacation for two weeks, starting today. I will get back to you after that. I definitely think there are things we can do to improve the library (while not breaking backwards compatibility. At the very least we should document the current behavior thoroughly.

Nov 13 '14 11:11 tibbe

@tibbe are you back? Shall we discuss this?

Here are some other things that I found very confusing and annoying:

Inconsistent exceptions:

> connect sock2 (SockAddrInet (fromIntegral 5433) host)
*** Exception: connect: does not exist (Connection refused)
> connect sock2 (SockAddrInet (fromIntegral 5433) host)
*** Exception: connect: failed (Software caused connection abort)
> connect sock2 (SockAddrInet (fromIntegral 5433) host)
*** Exception: connect: does not exist (Connection refused)
> connect sock2 (SockAddrInet (fromIntegral 5433) host)
*** Exception: connect: failed (Software caused connection abort)
> connect sock2 (SockAddrInet (fromIntegral 5433) host)
*** Exception: connect: does not exist (Connection refused)
> connect sock2 (SockAddrInet (fromIntegral 5433) host)
*** Exception: connect: failed (Software caused connection abort)
> connect sock2 (SockAddrInet (fromIntegral 5433) host)
*** Exception: connect: does not exist (Connection refused)
> connect sock2 (SockAddrInet (fromIntegral 5433) host)
*** Exception: connect: failed (Software caused connection abort)

Note how is goes back and forth between two different exceptions. No other changes are happening while repeating this command.

Documentation: It says about bind:

The Family passed to bind must be the same as that passed to socket.

But Family is not passed to bind, so I have no idea what is it talking about.
Documentation says about recv:

For TCP sockets, a zero length return value means the peer has closed its half side of the connection.

However, it doesn't explain what happens if the peer sends empty ByteString. I tried and found some more inconsistencies.
- If I send an empty string using TCP, I think it doesn't send at all. (or receiving side just ignores the message)
- If I send an empty string using UDP, receiver fails with this: *** Exception: Network.Socket.recv: end of file (end of file) Why is it throwing an exception instead of returning empty string?

Dec 02 '14 14:12 osa1

Note how is goes back and forth between two different exceptions

That is the underlying sockets API, not the network module.

Dec 03 '14 18:12 ghost

What's even worse is that the library actually allows us to set socket properties like SO_RCVTIMEO and SO_SNDTIMEO. So we can set timeouts, but can't handle them.

Haskell sockets are always set non-blocking (and IO is multiplexed inside the GHC runtime's IO manager), so these flags don't do anything anyways. Control over timeouts is possible using System.Timeout.timeout.

There is a lot of room for improvement in the network library APIs. However, it's part of the core Haskell platform, a lot of code depends on it, and it can't be changed willy-nilly. Something we could easily do is make a better exception type hierarchy for the network package, make sure they propagate all the raw information from errors (incl errno value), and document from every function which exceptions are thrown and why.

Dec 03 '14 18:12 gregorycollins

Based on http://www.reddit.com/r/haskell/comments/2o5558/is_network_library_poorly_implemented_or_am_i/

It seems to me like if somebody PRs the safer functions as new functions, not changing the originals, it'll get accepted.

Dec 04 '14 01:12 bitemyapp

@bitemyapp: a lot of things could be improved without making any API changes. Here's a concrete example:

accept :: Socket                        -- Queue Socket
       -> IO (Socket,                   -- Readable Socket
              SockAddr)                 -- Peer details

accept sock@(MkSocket s family stype protocol status) = do
 currentStatus <- readMVar status
 okay <- isAcceptable sock
 if not okay
   then
     ioError (userError ("accept: can't perform accept on socket (" ++ (show (family,stype,protocol)) ++") in status " ++
         show currentStatus))
   else do
     ...

Instead of userError here, network should implement a proper extensible exception hierarchy. If accept's docstring makes it clear that it will throw a SocketAlreadyClosedException then you can explicitly handle that.

Feb 16 '15 20:02 gregorycollins

Hi,

Is anyone working on this? Is there a design proposed anywhere on how the errors should be reported/handled. I'm new to Haskell and looking for something to work on. If there is a breakdown of tasks to be done for this issue, I can pickup a minor task and go from there. Please let me know. Thanks!

Mar 07 '19 21:03 aburn

Nobody is working on this. Please take this issue if you are intrested.

Mar 08 '19 01:03 kazu-yamamoto

@kazu-yamamoto Here is a quick pass at existing IO exceptions in network.

This absorbs existing conditions into a sum type, but does not seek to refine or call out more granularity than is already present in the existing stringly typed API.
Many existing error constructors accepted location strings, this was papering over the lack of callstacks. We can utilize HasCallStack via throwNetworkException to greatly improve this.
There are a number of error calls lurking that have not been addressed.

throwNetworkException :: HasCallStack => NetworkException' -> IO a
throwNetowrkException = throwIO . NetworkException ? callStack

data NetworkException = NetworkException
  { callStack :: CallStack
  , exception :: NetworkException'
  }

data NetworkException'
  = UnssuportedSocketType SocketType
    -- userError . concat $ ["Network.Socket.", caller, ": ", "socket type ", show stype, " unsupported on this system"]
  | UnsupportedAddressFamily Family
    -- userError $ "Network.Socket.Types.peekSockAddr: address family '" ++ show family ++ "' not supported."
    -- userError "Network.Socket.socketPort: AF_UNIX not supported."
  | SocketIsNoLongerValid
    -- userError $ "socketToHandle: socket is no longer valid"
  | OperationNotSupported String
    -- userError $ "Network.Socket.gai_strerror not supported: " ++ show n
  | NonPositiveByteLength
    -- mkInvalidRecvArgError "Network.Socket.ByteString.recv")
    -- mkInvalidRecvArgError "Network.Socket.recvBuf")
    -- mkInvalidRecvArgError "Network.Socket.recvBufFrom")
  | AddressNotFound
    -- mkIOError NoSuchThing message Nothing Nothing
  | NameNotFound
    -- ioeSetErrorString (mkIOError NoSuchThing message Nothing Nothing) err
  | SocketErrorCode CInt
    -- ioeSetErrorString (mkIOError OtherError name Nothing Nothing) str)
    -- errnoToIOError loc (Errno errno) Nothing Nothing)
  | InitializationFailure String
    -- userError "Network.Socket.Internal.withSocketsDo: Failed to initialise WinSock"

One item for debate is whether the source Socket should be included in NetworkException or in appropriate variants of NetworkException'

data NetworkException = NetworkException
  { callStack :: CallStack
  , exception :: NetworkException'
  , socket :: Maybe Socket
  }

vs

  | SocketIsNoLongerValid Socket

May 17 '20 19:05 eborden

There is also probably room for splitting into multiple exception types. One logical split is configuration/platform vs execution exceptions. In most cases configuration exceptions would never be caught, but instead just benefit from clear granular reporting.

data NetworkConfigurationException'
  = UnssuportedSocketType SocketType
    -- userError . concat $ ["Network.Socket.", caller, ": ", "socket type ", show stype, " unsupported on this system"]
  | UnsupportedAddressFamily Family
    -- userError $ "Network.Socket.Types.peekSockAddr: address family '" ++ show family ++ "' not supported."
    -- userError "Network.Socket.socketPort: AF_UNIX not supported."
  | OperationNotSupported String
    -- userError $ "Network.Socket.gai_strerror not supported: " ++ show n

data NetworkException'
  | SocketIsNoLongerValid
    -- userError $ "socketToHandle: socket is no longer valid"
  | NonPositiveByteLength
    -- mkInvalidRecvArgError "Network.Socket.ByteString.recv")
    -- mkInvalidRecvArgError "Network.Socket.recvBuf")
    -- mkInvalidRecvArgError "Network.Socket.recvBufFrom")
  | AddressNotFound
    -- mkIOError NoSuchThing message Nothing Nothing
  | NameNotFound
    -- ioeSetErrorString (mkIOError NoSuchThing message Nothing Nothing) err
  | SocketErrorCode CInt
    -- ioeSetErrorString (mkIOError OtherError name Nothing Nothing) str)
    -- errnoToIOError loc (Errno errno) Nothing Nothing)
  | InitializationFailure String
    -- userError "Network.Socket.Internal.withSocketsDo: Failed to initialise WinSock"

May 17 '20 19:05 eborden

I don't have any preference.

To express errors which do not match your definition, OtherError CInt or something is necessary, I guess.

May 17 '20 20:05 kazu-yamamoto

network network copied to clipboard

Problems with error reporting/handling

network
network copied to clipboard