wsServer icon indicating copy to clipboard operation
wsServer copied to clipboard

Connection drops, strange behaviour

Open focussing opened this issue 11 months ago • 5 comments

Hi @Theldus,

Thank you for this great project! About 1.5 years ago I used it for a prototype system, and since 9 months in production. To finalyze it for the customer I ran into an issue on which I would like to have your advice. The architecture is basically this:

Image

The system consists of a Linux board which runs a C application to handle all hardware I/O interactions of the machine. User interaction and status display is done via a browser which connects so the webserver on the Linux board. The user(s) establishes a VPN connection by opening the router's admin webpage and is then able to open the webpage of the Linux board. All information is sent using websockets, the websockets server is running via the C-application on the Linux board

The issue occurs when a user opens a browser window for the router's admin webpage and makes a VPN connection. Also the user opens a browser window with the system's webpage. Both are side by side and visible (active). When the system website is opened the Linux board receives the request and an onConnect event is generated. Using this websocket connection the website and system communicate. The nr_connections variable is incremented by one, which I can see on the Linux board's console. An LED on the board blinks 1x per second. All is fine.

Then the user disconnects the VPN connection; this is not noticed by the Linux board. When the system webpage is refreshed nothing really happens, only nr_connections is suddenly 2 after some seconds, and later 3, and again later 4... or even more. The LED stops blinking, but I can still connect to the Linux board via SSL and I see that the board is still working. Sometimes after a while I see the nr_connections decrease again and finally the LED blinks 1x per second again. Sometimes the board does not recover at all.

The following picture shows what happened

Image

For the user (customer) the system cannot be controlled or monitored.

Can you give me some advice how to solve this?

This is my code, wsInit() is called with useThread = true;.

//-------------------------------------------------------------------------------------------------
// ws.c
//-------------------------------------------------------------------------------------------------

//-------------------------------------------------------------------------------------------------
// include files
//-------------------------------------------------------------------------------------------------

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

#include <wsserver/ws.h>
#include "ws.h"

#include "website.h"

//-------------------------------------------------------------------------------------------------
// local variables
//-------------------------------------------------------------------------------------------------

static int NrConnections = 0;

//-------------------------------------------------------------------------------------------------
// called when a client connects to the server
//-------------------------------------------------------------------------------------------------

static void onopen(ws_cli_conn_t *client)
{
    char *cli;
    cli = ws_getaddress(client);
#ifndef DISABLE_VERBOSE
    printf("Connection opened, addr: %s\n", cli);
#endif

    NrConnections++;
    printf("nr connections: %i\n", NrConnections);

    websiteSendOutputs();
    websiteSendOutputsStatus();
    websiteSendInputs();
    websiteSendInterlocks();
    websiteSendAnalogInputs();
    websiteSendUnitStart();
    websiteSendVersion();
}

//-------------------------------------------------------------------------------------------------
// called when a client disconnects from the server
//-------------------------------------------------------------------------------------------------

static void onclose(ws_cli_conn_t *client)
{
    char *cli;
    cli = ws_getaddress(client);
#ifndef DISABLE_VERBOSE
    printf("Connection closed, addr: %s\n", cli);
#endif

    NrConnections--;
    printf("nr connections: %i\n", NrConnections);
}

//-------------------------------------------------------------------------------------------------
// called when a client sends a message to the server
//-------------------------------------------------------------------------------------------------

static void onmessage(ws_cli_conn_t *client, const unsigned char *msg, uint64_t size, int type)
{
#ifndef DISABLE_VERBOSE
    // char *cli;
    // cli = ws_getaddress(client);
    // printf("message received: %s (size: %" PRId64 ", type: %d), from: %s\n", msg, size, type, cli);
#endif

    websiteHandleCommand((char *)msg);
}

//-------------------------------------------------------------------------------------------------
// sends message to server
//-------------------------------------------------------------------------------------------------

void wsSendMessage(char *message)
{
    // printf("wssend %s\n", message);

    if (wsConnected())
    {
        int size = strlen(message);
        int type = 1;

        ws_sendframe(NULL, (char *)message, size, type);
    }
}

//-------------------------------------------------------------------------------------------------
// after invoking @ref ws_socket, this routine never returns,
// unless if invoked from a different thread.
//-------------------------------------------------------------------------------------------------

void wsInit(int useThread)
{
    struct ws_events evs;
    evs.onopen = &onopen;
    evs.onclose = &onclose;
    evs.onmessage = &onmessage;
    ws_socket(&evs, 8080, useThread, 1000);
}

//-------------------------------------------------------------------------------------------------
// returns the connected state
//-------------------------------------------------------------------------------------------------

bool wsConnected(void)
{
    return NrConnections >= 1;
}

focussing avatar Jan 21 '25 12:01 focussing

Hi @focussing,

First of all, thank you for using wsServer! It truly makes me happy to see my code being useful to others—it’s incredibly motivating and encourages me to keep improving this project.

Let me ensure I’ve understood the situation correctly:

  1. You connect to wsServer (running behind a VPN), and the server exchanges some data with the WebSocket client.
  2. When the user wants to disconnect, they disconnect from the VPN directly, and that’s when the issue begins.

The issue: After the user disconnects from the VPN:

  • The server doesn’t immediately detect the disconnection.
  • Even worse, the variable tracking the number of active connections continues to increase.
  • Eventually (and unpredictably), the connection count drops back to the correct value, and the server normalizes.

Did I understand correctly? If so, this behavior is certainly unexpected.


Two things caught my attention:

  1. Potential race condition:
    From your description, it seems the variable managing connection counts may not be properly protected, which could result in a race condition if multiple clients connect and disconnect simultaneously.

To address this, I recommend ensuring atomicity when modifying the connection counter:

  • Using atomic variables (preferred):
    If your compiler supports the C11+ standard (likely with GCC/Clang on Linux), atomic variables are the simplest and safest option:

    static _Atomic int NrConnections;  
    [...]  
    NrConnections++;  
    

    This guarantees atomic increments and decrements without requiring manual synchronization.
    You can read more about atomic operations in this C11 concurrency guide.

  • Using mutexes:
    If your environment doesn’t support atomic types, you can achieve thread safety using a mutex:

    pthread_mutex_t mutex_counter = PTHREAD_MUTEX_INITIALIZER;  
    [...]  
    pthread_mutex_lock(&mutex_counter);  
      NrConnections++;  
    pthread_mutex_unlock(&mutex_counter);  
    

    While slightly more complex than atomic variables, this approach ensures safe concurrent access to shared resources.

  1. Handling sudden disconnects
    When a user abruptly disconnects from the VPN, the OS may take its time to recognize that the connection is no longer valid. This is pretty much the same as unplugging an ethernet cable, for example.

This is a common issue, and wsServer includes a built-in solution: ping frames.

Ping Frames allow the server to detect unresponsive clients by periodically sending pings and waiting for corresponding pong responses. If the client fails to respond within a configured threshold, the server automatically closes the connection.

You can see a demonstration in this ping example, but the basic idea is:

  • Each time you call ws_ping(), it checks whether the previous ping has a corresponding pong response.
  • If no pong is received after the threshold, the server considers the connection lost and closes it.

The idea is to call ws_ping() asynchronously (in a new thread?) and periodically, so that 'stuck' connections are cleaned up.

[!NOTE] I don't know how your application is configured, but placing a disconnect button on the client side and prompting the user to use it before closing the VPN can minimize these cases.

Of course, there is no way to force this behavior on the part of the customer (and users love to break things), hence the suggested ping approach above.

Theldus avatar Jan 22 '25 00:01 Theldus

Hi @Theldus,

Thank you for your rapid reply!

Yes you understand correctly. However the connection is also broken when on a smartphone the user switches Wifi off, and then the phone connects to 5G for example

  1. The compiler accepts _Atomic, I just recompiled via remote desktop. I cannot test it today because I am out of the office today. In this particular case I was the only user so maybe it would not help in this case.

  2. The ping idea sounds promising! How can I addres one specific connection? Normally I only use ws_sendframe which results in sending to all connections at the same time.

NB: could it be that I am using an old version of the ws-server because there is no real explanation of the effect that I see? How can I check the version?

focussing avatar Jan 22 '25 14:01 focussing

Hi @focussing,

However the connection is also broken when on a smartphone the user switches Wifi off, and then the phone connects to 5G for example

Yes, that's expected, because this is an abrupt disconnection.

The compiler accepts _Atomic, I just recompiled via remote desktop. I cannot test it today because I am out of the office today. In this particular case I was the only user so maybe it would not help in this case.

That makes sense. If you have this guarantee: 'a single connection per-time', it should be safe to not use locks or atomics.

The ping idea sounds promising! How can I addres one specific connection? Normally I only use ws_sendframe which results in sending to all connections at the same time.

The function prototype for ws_ping(), is as follows:

void ws_ping(ws_cli_conn_t client, int threshold);

where client is the target (0 if broadcast), and threshold is the 'tolerance' for not receiving a pong. (Honestly, I see no reason to not use the broadcast option, since it will keep your connection list always "sane".)

To illustrate better, the output below is a real example I made using my phone to connect to wsServer running on my PC:

examples/echo/echo 0
(PING via thread, ws blocking)
Sending ping... 0
Waiting for incoming connections...
Connection opened, addr: 10.0.0.248
nr connections: 1
Sending ping... 1
Sending ping... 2
Sending ping... 3  <- disable my phone wifi here
Sending ping... 4  <- pong not received
Sending ping... 5  <- pong not received
Sending ping... 6  <- 2 pings without a pong, close the connection
Connection closed, addr: (null)
nr connections: 0
Sending ping... 7
^C

The code for the example is as simple as:

while (1) {
    printf("Sending ping... %d\n", ping++);
    ws_ping(0, 2);
    sleep(10);
}

So every 10 seconds I send a ping frame, with 2 missed pings of tolerance. Note that on the third the connection is closed. If I don't do that, the connection is kept open forever.

The complete example:

**example.c**
//-------------------------------------------------------------------------------------------------
// ws.c
//-------------------------------------------------------------------------------------------------

//-------------------------------------------------------------------------------------------------
// include files
//-------------------------------------------------------------------------------------------------

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <ws.h>

#include <pthread.h>

//-------------------------------------------------------------------------------------------------
// local variables
//-------------------------------------------------------------------------------------------------
static int NrConnections = 0;

//-------------------------------------------------------------------------------------------------
// returns the connected state
//-------------------------------------------------------------------------------------------------

bool wsConnected(void) {
    return NrConnections >= 1;
}

//-------------------------------------------------------------------------------------------------
// called when a client connects to the server
//-------------------------------------------------------------------------------------------------

static void onopen(ws_cli_conn_t client)
{
    char *cli;
    cli = ws_getaddress(client);
    printf("Connection opened, addr: %s\n", cli);

    NrConnections++;
    printf("nr connections: %i\n", NrConnections);
}

//-------------------------------------------------------------------------------------------------
// called when a client disconnects from the server
//-------------------------------------------------------------------------------------------------

static void onclose(ws_cli_conn_t client)
{
    char *cli;
    cli = ws_getaddress(client);
    printf("Connection closed, addr: %s\n", cli);

    NrConnections--;
    printf("nr connections: %i\n", NrConnections);
}

//-------------------------------------------------------------------------------------------------
// called when a client sends a message to the server
//-------------------------------------------------------------------------------------------------

static void
onmessage(ws_cli_conn_t client,
    const unsigned char *msg, uint64_t size, int type)
{
    ((void)client);
    ((void)msg);
    ((void)size);
    ((void)type);
}

//-------------------------------------------------------------------------------------------------
// sends message to server
//-------------------------------------------------------------------------------------------------

void wsSendMessage(char *message)
{
    if (wsConnected()) {
        int size = strlen(message);
        int type = 1;
        ws_sendframe_bcast(8080, message, size, type);
    }
}

//-------------------------------------------------------------------------------------------------
// after invoking @ref ws_socket, this routine never returns,
// unless if invoked from a different thread.
//-------------------------------------------------------------------------------------------------

void wsInit(int useThread) {
    ws_socket(&(struct ws_server){
        .host = "0.0.0.0",
        .port = 8080,
        .thread_loop   = useThread,
        .timeout_ms    = 1000,
        .evs.onopen    = &onopen,
        .evs.onclose   = &onclose,
        .evs.onmessage = &onmessage
    });
}

//////////////////////////////////////////////////////////////////////
static void *ping_blocking(void* ptr)
{
    ((void)ptr);
    int ping = 0;

    while (1) {
        printf("Sending ping... %d\n", ping++);
        ws_ping(0, 2);
        sleep(10);
    }
    return NULL;
}

static void ping_via_thread(void)
{
    pthread_t thr;

    thr = pthread_create(&thr, NULL, ping_blocking, NULL);
    if (thr) {
        fprintf(stderr, "Unable to create thread!\n");
        exit(1);
    }
}


int main(int argc, char **argv)
{
    if (argc < 2) {
        fprintf(stderr, "Invalid arguments: %s [0|1|2]\n", argv[0]);
        return 1;
    }

    /* If: ./echo 0 */
    if (argv[1][0] == '0') {
        printf("(PING via thread, ws blocking)\n");
        ping_via_thread();
        wsInit(0);
    }
    else if (argv[1][0] == '1') {
        printf("(PING via main thread, ws non-blocking)\n");
        wsInit(1);
        ping_blocking(NULL);
    }
    else {
        printf("(No PING)\n");
        wsInit(0);
    }
}

In the example above, you have 3 execution options:

  • ./example 0: Creates a new thread, which then loops making ping calls.
  • ./example 1: Uses the program's main thread for this.
  • ./example 2: Does not perform pings, useful for testing the behavior of on_close() in cases of abrupt disconnections.

NB: could it be that I am using an old version of the ws-server because there is no real explanation of the effect that I see? How can I check the version?

Yes, you are, but I am not aware of any bug that I have encountered that would cause the server to behave exactly as you described.

About version... unfortunately wsServer does not have well-defined releases... if you know the commit hash it is easy to tell how far back you were in the commits.

Judging by your example, I would say that you are at least 6 commits behind, when there was a change in the function signature in:

In general, I advise you to do some tests with ws_ping(), maybe that will solve your problem =).

Theldus avatar Jan 23 '25 01:01 Theldus

Hi @Theldus,

Wow what a comprehensive reply! Thank you so much!

Just to give you an idea what we're talking about... Image

I did my test to be absolutely sure that only myself was connected to rule out possible other effects. In real life during system support or diagnostics it could be that 3 maybe 4 users are connected at the same time.

If I understand correctly in your example you are using ws_ping in broadcast mode (first arg is 0), would you recommend doing that also, or is it better to address individual connections separately? If so how should I do that?

focussing avatar Jan 23 '25 07:01 focussing

Hi @focussing,

Just to give you an idea what we're talking about...

Wow, this is huge 😮. So I guess your project consists of connecting sensors from this to an embedded Linux device, on which you then run wsServer to do some processing. I'm impressed.

I did my test to be absolutely sure that only myself was connected to rule out possible other > effects. In real life during system support or diagnostics it could be that 3 maybe 4 users are connected at the same time.

I see, so your issue is certainly not related to concurrency, although you should keep this in mind.

If I understand correctly in your example you are using ws_ping in broadcast mode (first arg is 0), would you recommend doing that also, or is it better to address individual connections separately? If so how should I do that?

There’s absolutely no harm in performing a ping broadcast. The worst outcome is disconnecting unresponsive clients, which is a good thing. However, you might want to experiment with the threshold parameter to allow for greater tolerance with slower devices.

For example, setting a threshold of "2" permits up to two consecutive missed pings. If you call ws_ping() every 10 seconds, unresponsive clients would be disconnected after 30 seconds. Adjust this value (and the frequency you call the function) to find the balance that works best for you.

Theldus avatar Jan 24 '25 00:01 Theldus