network icon indicating copy to clipboard operation
network copied to clipboard

Socket type is incorrect on Windows

Open Mistuke opened this issue 5 years ago • 15 comments

Socket is defined as holding a CInt, but on Windows SOCKET is an unsigned 64 bit value, usually UINT_PTR. See [1] and [2] this means we may truncate a valid SOCKET to something invalid.

[1] https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-headers/include/psdk_inc/_socket_types.h [2] https://docs.microsoft.com/en-us/windows/win32/winsock/socket-data-type-2

Mistuke avatar Sep 17 '19 23:09 Mistuke

@Mistuke Could you fix it by yourself?

kazu-yamamoto avatar Sep 17 '19 23:09 kazu-yamamoto

Yes but I'm not sure how yet. The problem is it's an exported type and we have functions where the CInt escapes into an external API.

I fear this may break a lot of packages.

Mistuke avatar Sep 17 '19 23:09 Mistuke

  • How large CInt is on Windows?
  • Do you mind if we release version 3.1.1.0 without fixing this?
  • Should we change the error checking style for socket() and accept() on WIndows?

kazu-yamamoto avatar Sep 18 '19 00:09 kazu-yamamoto

If we export the following CSocket instead of CInt, what will happen?

#if defined(mingw32_HOST_OS)
type CSocket = SOCKET
#else
type CSocket = CInt
#endif

kazu-yamamoto avatar Sep 18 '19 00:09 kazu-yamamoto

How large CInt is on Windows?

Same as Linux, just 32-bits.

Do you mind if we release version 3.1.1.0 without fixing this?

Yeah I don't consider this a blocker, it's been like this for years so.

Should we change the error checking style for socket() and accept() on WIndows?

Yes as it returns SOCKET_ERROR. so an unsigned -1 not a signed one.

But the problem is withFdSocket, socketToFd and mkSocket all expose the CInt, so what I meant is that changing this to the wider type on Windows would likely cause anything that uses these calls to fail..

Mistuke avatar Sep 18 '19 22:09 Mistuke

Sorry but I did not know that the size of CInt is 4 even on Unix. Should we use CIntMax instead?

kazu-yamamoto avatar Sep 20 '19 00:09 kazu-yamamoto

Hmm I though that an fd on Unix is a signed 32-bit value? so CInt is correct on Unix no? For WIndows you'd need an unsigned value so CUIntMax would be appropriate.

Mistuke avatar Sep 21 '19 18:09 Mistuke

Do you have a source which explains that fd on Unix is limited to 32-bit?

kazu-yamamoto avatar Sep 24 '19 00:09 kazu-yamamoto

Not directly, but all the docs for e.g. epoll uses int for fd https://linux.die.net/man/2/epoll_ctl and while not authoritative it is backed up by wikipedia https://en.wikipedia.org/wiki/File_descriptor

Mistuke avatar Sep 24 '19 17:09 Mistuke

Wow. I misunderstood that sizeof int on 64bit Unix is 8. OK. It's 4. And epoll's manual is a good evidence.

kazu-yamamoto avatar Sep 26 '19 09:09 kazu-yamamoto

The size if the "int" type in C is implementation dependent. Historically (very long ago) it was even 16 bits. Now most Unix system have 32-bit ints, but some Unix variants (perhaps Cray systems) IIRC have had 64 bit ints.

This is why haskell has a CInt data type rather than just using Int32 or similar. On any given Haskell runtime CInt may be 32 or 64 bits.

In GHC, despite the documentation in base https://hackage.haskell.org/package/base-4.12.0.0/docs/Foreign-C-Types.html, the actual type seems to be: https://github.com/ghc/ghc/blob/21f0f56164f50844c2150c62f950983b2376f8b6/libraries/base/Foreign/C/Types.hs#L119-L122 and CInt is defined indirectly via HTYPE_INT, which in turn comes from configure.ac: https://github.com/ghc/ghc/blob/21f0f56164f50844c2150c62f950983b2376f8b6/libraries/base/configure.ac#L118-L175

So, the bottom line is that these types are platform-dependent. They do have Storable instances, so that on any given platform the FFI can serialize and deserialize them portably regardless of size variations.

Now as for Sockets, indeed on Windows an OS Socket is NOT an int, and portable code needs to abstract the type (e.g. in Heimdal Kerberos: Windows: https://github.com/heimdal/heimdal/blob/afaaf3d89d86bb33f42a63767b41f57c24238aed/lib/roken/roken.h.in#L108 Unix: https://github.com/heimdal/heimdal/blob/afaaf3d89d86bb33f42a63767b41f57c24238aed/lib/roken/roken.h.in#L126 ).

This also means that sockets can't directly be converted back and forth to "file descriptors". The Haskell API can convert between Network.Socket Sockets and OS integral SOCKETs with the latter type being 32 bit signed on Unix and 64-bit unsigned on Windows.

vdukhovni avatar Oct 09 '19 05:10 vdukhovni

@vdukhovni Thank you for your explanation. Very clear.

kazu-yamamoto avatar Oct 10 '19 01:10 kazu-yamamoto

@kazu-yamamoto I opened an issue on GHC's gitlab that is motivated by this discussion: https://gitlab.haskell.org/ghc/ghc/issues/17335 Do you think I'm making a valid point, or is this unlikely to cause any confusion?

vdukhovni avatar Oct 10 '19 02:10 vdukhovni

I just updated that issue, it seems that the top-matter (introductory text for the module) already discusses the portability caveat, but it is arguably too easy to miss when one is in a hurry?

vdukhovni avatar Oct 10 '19 02:10 vdukhovni

Yes base has a few of those disclaimers around. In particular the Win32 stuff in base are mostly all wrong.

For the Win32 module I override the docs with ones generated in Windows.

But in a general sense they're wrong for all platforms where types differ not only by OS but by architecture as well.

Mistuke avatar Oct 10 '19 13:10 Mistuke