Remotery icon indicating copy to clipboard operation
Remotery copied to clipboard

TCPSocket_PollStatus crash on Linux

Open rotoglup opened this issue 6 years ago • 12 comments

While trying to use Remotery (@ 589ded3f2e09) on my Linux app, I get a crash every time the browser connects on the app.

I managed to trace the issue to TCPSocket_PollStatus which corrupts the stack.

It seems that the crash is provoked by the select call, which is known to cause memory corruptions when the given fd is larger than 1024... see https://sigquit.wordpress.com/2009/12/24/stack-corruption-improper-use-of-fd_set/

I've put an assert(tcp_socket->socket+1<FD_SETSIZE); before the selectcall, which fires right before the crash occurs.

rotoglup avatar Feb 08 '19 15:02 rotoglup

FWIW, I've had some success using poll instead of select, here's my function :

static SocketStatus TCPSocket_PollStatus(TCPSocket* tcp_socket)
{
    SocketStatus status;
    struct pollfd fds[1];

    status.can_read = RMT_FALSE;
    status.can_write = RMT_FALSE;
    status.error_state = RMT_ERROR_NONE;

    assert(tcp_socket != NULL);
    if (tcp_socket->socket == INVALID_SOCKET)
    {
        status.error_state = RMT_ERROR_SOCKET_INVALID_POLL;
        return status;
    }

    // Set read/write/error markers for the socket
    fds[0].fd = tcp_socket->socket;
    fds[0].events = POLLIN | POLLPRI | POLLOUT | POLLERR;

    // Poll socket status without blocking
    if (poll(fds, 1, 0) == SOCKET_ERROR)
    {
        status.error_state = RMT_ERROR_SOCKET_SELECT_FAIL;
        return status;
    }

    status.can_read = (fds[0].revents & (POLLIN | POLLPRI)) != 0 ? RMT_TRUE : RMT_FALSE;
    status.can_write = (fds[0].revents & POLLOUT) != 0 ? RMT_TRUE : RMT_FALSE;
    status.error_state = (fds[0].revents & POLLERR) != 0 ? RMT_ERROR_SOCKET_POLL_ERRORS : RMT_ERROR_NONE;
    return status;
}

rotoglup avatar Feb 08 '19 17:02 rotoglup

Thanks, reading up on all this now.

dwilliamson avatar Feb 08 '19 17:02 dwilliamson

This certainly is a cross-platform minefield. https://daniel.haxx.se/docs/poll-vs-select.html

dwilliamson avatar Feb 08 '19 18:02 dwilliamson

Sure is ! Do toujours think trying to bail out and error using FD_SETSIZE would be possible/a thing to try ?

Le ven. 8 févr. 2019 à 19:29, Don Williamson [email protected] a écrit :

This certainly is a cross-platform minefield. https://daniel.haxx.se/docs/poll-vs-select.html

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Celtoys/Remotery/issues/154#issuecomment-461899560, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4hksLblwWxjqAQyGDp3kVE7P2cBP-nks5vLcIIgaJpZM4awF4K .

rotoglup avatar Feb 10 '19 19:02 rotoglup

I mean, this is a pretty crazy old API limitation, almost funny. I'm not sure I want to introduce poll as I'd still need a select fallback for the systems that don't have it and then they might suffer the issue!

How's about trying to trick it? Could you try negatively offsetting the address of fd_read etc. So:

int byte_index = socket / 8;
int bit_index = socket & 7;
FD_ZERO(&fd_read);
FD_SET(bit_index, &fd_read);
select(socket + 1, (char*)&fd_read - byte_index, ...
can_read = FD_ISSET(bit_index, &fd_read);

dwilliamson avatar Feb 10 '19 20:02 dwilliamson

Just looked at the implementation of fd_set and realised it's not actually a "bit array" but a SOCKET array (in winsock2.h). It looks like you can't rely on any implementation patterns :/

dwilliamson avatar Feb 10 '19 21:02 dwilliamson

Would the only sensible thing to do be to check on FD_SETSIZE ? At least to prevent crashing. This would need to change TCPSocket_PollStatus and WebSocketHandshake (to add the error code) :

if (FD_SETSIZE <= tcp_socket->socket) {
    status.error_state = RMT_ERROR_SOCKET_SELECT_FAIL;
    return status;
}

rotoglup avatar Feb 11 '19 11:02 rotoglup

I think that would help and needs adding, with obvious error reporting to the user. But it's not the solution.

The irony in all this is that it's fine on Windows as fd_set is an array of SOCKET entries as opposed to a bit array so there's no chance of this kind of overflow.

So... we may need an #ifdef branch where poll is used on all platforms other than Windows.

dwilliamson avatar Feb 11 '19 11:02 dwilliamson

I'm traditionnally a Windows developper, I'm a newcomer in *NIX land, so I'm not versed enough to help on the poll reliability ! Maybe allow users to control the poll activation would be nice too - just in case.

rotoglup avatar Feb 11 '19 20:02 rotoglup

It's in wide use as a replacement and was introduced in 1986 vs 1983 but POSIX-like systems sometimes muck it up (e.g. see https://daniel.haxx.se/blog/2016/10/11/poll-on-mac-10-12-is-broken/). Remotery is used on so many varying platforms now that something this core is quite nerve-wracking to change.

dwilliamson avatar Feb 11 '19 20:02 dwilliamson

Yep, I saw on the same blog (your earlier link) that poll is broken on OSX <= 10.4 ;

If this issue was never brought to you, I understand that all this looks like opening a can of worms :) poll could be strictly an opt-in solution for people who get in trouble because of select errors related to fd values.

On Mon, Feb 11, 2019 at 9:26 PM Don Williamson [email protected] wrote:

It's in wide use as a replacement and was introduced in 1986 vs 1983 but POSIX-like systems sometimes muck it up (e.g. see https://daniel.haxx.se/blog/2016/10/11/poll-on-mac-10-12-is-broken/). Remotery is used on so many varying platforms now that something this core is quite nerve-wracking to change.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Celtoys/Remotery/issues/154#issuecomment-462481201, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4hkpNrJ2HwQP3KxVq0Df-s3GO_mUSiks5vMdHpgaJpZM4awF4K .

rotoglup avatar Feb 11 '19 20:02 rotoglup

Good idea. Maybe worth doing it the other way round: use select on Windows and poll elsewhere. If that fails to compile for somebody can introduce a user define that forces use of select.

Motivation being poll should work on the majority of use-cases. Only a few exceptions may fail to work.

dwilliamson avatar Feb 11 '19 21:02 dwilliamson