network Socket type is incorrect on Windows

Socket is defined as holding a CInt, but on Windows SOCKET is an unsigned 64 bit value, usually UINT_PTR. See [1] and [2] this means we may truncate a valid SOCKET to something invalid.

[1] https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-headers/include/psdk_inc/_socket_types.h [2] https://docs.microsoft.com/en-us/windows/win32/winsock/socket-data-type-2

Sep 17 '19 23:09 Mistuke

@Mistuke Could you fix it by yourself?

Sep 17 '19 23:09 kazu-yamamoto

Yes but I'm not sure how yet. The problem is it's an exported type and we have functions where the CInt escapes into an external API.

I fear this may break a lot of packages.

Sep 17 '19 23:09 Mistuke

How large CInt is on Windows?
Do you mind if we release version 3.1.1.0 without fixing this?
Should we change the error checking style for socket() and accept() on WIndows?

Sep 18 '19 00:09 kazu-yamamoto

If we export the following CSocket instead of CInt, what will happen?

#if defined(mingw32_HOST_OS)
type CSocket = SOCKET
#else
type CSocket = CInt
#endif

Sep 18 '19 00:09 kazu-yamamoto

How large CInt is on Windows?

Same as Linux, just 32-bits.

Do you mind if we release version 3.1.1.0 without fixing this?

Yeah I don't consider this a blocker, it's been like this for years so.

Should we change the error checking style for socket() and accept() on WIndows?

Yes as it returns SOCKET_ERROR. so an unsigned -1 not a signed one.

But the problem is withFdSocket, socketToFd and mkSocket all expose the CInt, so what I meant is that changing this to the wider type on Windows would likely cause anything that uses these calls to fail..

Sep 18 '19 22:09 Mistuke

Sorry but I did not know that the size of CInt is 4 even on Unix. Should we use CIntMax instead?

Sep 20 '19 00:09 kazu-yamamoto

Hmm I though that an fd on Unix is a signed 32-bit value? so CInt is correct on Unix no? For WIndows you'd need an unsigned value so CUIntMax would be appropriate.

Sep 21 '19 18:09 Mistuke

Do you have a source which explains that fd on Unix is limited to 32-bit?

Sep 24 '19 00:09 kazu-yamamoto

Not directly, but all the docs for e.g. epoll uses int for fd https://linux.die.net/man/2/epoll_ctl and while not authoritative it is backed up by wikipedia https://en.wikipedia.org/wiki/File_descriptor

Sep 24 '19 17:09 Mistuke

Wow. I misunderstood that sizeof int on 64bit Unix is 8. OK. It's 4. And epoll's manual is a good evidence.

Sep 26 '19 09:09 kazu-yamamoto

The size if the "int" type in C is implementation dependent. Historically (very long ago) it was even 16 bits. Now most Unix system have 32-bit ints, but some Unix variants (perhaps Cray systems) IIRC have had 64 bit ints.

This is why haskell has a CInt data type rather than just using Int32 or similar. On any given Haskell runtime CInt may be 32 or 64 bits.

In GHC, despite the documentation in base https://hackage.haskell.org/package/base-4.12.0.0/docs/Foreign-C-Types.html, the actual type seems to be: https://github.com/ghc/ghc/blob/21f0f56164f50844c2150c62f950983b2376f8b6/libraries/base/Foreign/C/Types.hs#L119-L122 and CInt is defined indirectly via HTYPE_INT, which in turn comes from configure.ac: https://github.com/ghc/ghc/blob/21f0f56164f50844c2150c62f950983b2376f8b6/libraries/base/configure.ac#L118-L175

So, the bottom line is that these types are platform-dependent. They do have Storable instances, so that on any given platform the FFI can serialize and deserialize them portably regardless of size variations.

Now as for Sockets, indeed on Windows an OS Socket is NOT an int, and portable code needs to abstract the type (e.g. in Heimdal Kerberos: Windows: https://github.com/heimdal/heimdal/blob/afaaf3d89d86bb33f42a63767b41f57c24238aed/lib/roken/roken.h.in#L108 Unix: https://github.com/heimdal/heimdal/blob/afaaf3d89d86bb33f42a63767b41f57c24238aed/lib/roken/roken.h.in#L126 ).

This also means that sockets can't directly be converted back and forth to "file descriptors". The Haskell API can convert between Network.Socket Sockets and OS integral SOCKETs with the latter type being 32 bit signed on Unix and 64-bit unsigned on Windows.

Oct 09 '19 05:10 vdukhovni

@vdukhovni Thank you for your explanation. Very clear.

Oct 10 '19 01:10 kazu-yamamoto

@kazu-yamamoto I opened an issue on GHC's gitlab that is motivated by this discussion: https://gitlab.haskell.org/ghc/ghc/issues/17335 Do you think I'm making a valid point, or is this unlikely to cause any confusion?

Oct 10 '19 02:10 vdukhovni

I just updated that issue, it seems that the top-matter (introductory text for the module) already discusses the portability caveat, but it is arguably too easy to miss when one is in a hurry?

Oct 10 '19 02:10 vdukhovni

Yes base has a few of those disclaimers around. In particular the Win32 stuff in base are mostly all wrong.

For the Win32 module I override the docs with ones generated in Windows.

But in a general sense they're wrong for all platforms where types differ not only by OS but by architecture as well.

Oct 10 '19 13:10 Mistuke