Home icon indicating copy to clipboard operation
Home copied to clipboard

ESP32 completely hangs if a WebSocket Server shutdown while a NF based WebSocket client was attached

Open muenchris opened this issue 3 months ago • 4 comments

Library/API/IoT binding

System.Net.WebSockets

Visual Studio version

Any

.NET nanoFramework extension version

2022.17.19

Target name(s)

ESP32_OLIMEX_WROVER

Firmware version

1.14.x

Device capabilities

System Information HAL build info: nanoCLR running @ ESP32 built with ESP-IDF v5.4.2 Target: ESP32_OLIMEX_WROVER Platform: ESP32

Firmware build Info: Date: Nov 11 2025 Type: MinSizeRel build, chip rev. >= 3, support for PSRAM CLR Version: 1.14.0.129 Compiler: GNU ARM GCC v14.2.0

OEM Product codes (vendor, model, SKU): 0, 0, 0

Serial Numbers (module, system): 00000000000000000000000000000000 10000000001097BDE2F550

Target capabilities: Has nanoBooter: NO IFU capable: NO Has proprietary bootloader: YES

AppDomains:

Assemblies:

Native Assemblies: mscorlib v100.5.0.24, checksum 0x1549C856 nanoFramework.Runtime.Native v100.0.10.0, checksum 0x0EAE898B nanoFramework.Hardware.Esp32 v100.0.10.0, checksum 0x6A20A689 nanoFramework.Hardware.Esp32.Rmt v100.0.5.1, checksum 0x8ADAC728 nanoFramework.Device.OneWire v100.0.4.0, checksum 0xB95C43B4 nanoFramework.Networking.Sntp v100.0.4.4, checksum 0xE2D9BDED nanoFramework.ResourceManager v100.0.0.1, checksum 0xDCD7DF4D nanoFramework.System.Collections v100.0.2.0, checksum 0x40DC251F nanoFramework.System.Text v100.0.0.1, checksum 0x8E6EB73D nanoFramework.System.IO.Hashing v100.0.0.1, checksum 0xEBD8ED20 nanoFramework.System.Security.Cryptography v100.0.0.3, checksum 0x343142CA nanoFramework.Runtime.Events v100.0.8.0, checksum 0x0EAB00C9 EventSink v1.0.0.0, checksum 0xF32F4C3E System.IO.FileSystem v1.1.0.4, checksum 0x1777E2FE System.Math v100.0.5.5, checksum 0x9F9E2A7E System.Net v100.2.0.11, checksum 0xD82C1452 System.Device.Adc v100.0.0.0, checksum 0xE5B80F0B System.Device.Dac v100.0.0.6, checksum 0x02B3E860 System.Device.Gpio v100.1.0.6, checksum 0x097E7BC5 System.Device.I2c v100.0.0.2, checksum 0xFA806D33 System.Device.I2c.Slave v1.0.0.0, checksum 0x4238164B System.Device.I2s v100.0.0.1, checksum 0x478490FE System.Device.Pwm v100.1.0.4, checksum 0xABF532C3 System.IO.Ports v100.1.6.1, checksum 0xB798CE30 System.Device.Spi v100.1.2.0, checksum 0x3F6E2A7E System.Runtime.Serialization v100.0.0.0, checksum 0x0A066871 System.Device.Wifi v100.0.6.4, checksum 0x00A058C6

++++++++++++++++++++++++++++++++ ++ Memory Map ++ ++++++++++++++++++++++++++++++++ Type Start Size ++++++++++++++++++++++++++++++++ RAM 0x3f80087c 0x003b0000 FLASH 0x00000000 0x00400000

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++ Flash Sector Map ++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Region Start Blocks Bytes/Block Usage +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 0 0x00010000 1 0x1D0000 nanoCLR 1 0x001E0000 1 0x1C0000 Deployment 2 0x003C0000 1 0x040000 Configuration

+++++++++++++++++++++++++++++++++++++++++++++++++++ ++ Storage Usage Map ++ +++++++++++++++++++++++++++++++++++++++++++++++++++ Start Size (kB) Usage +++++++++++++++++++++++++++++++++++++++++++++++++++ 0x003C0000 0x040000 (256kB) Configuration 0x00010000 0x1D0000 (1856kB) nanoCLR 0x001E0000 0x1C0000 (1792kB) Deployment

Description

When a NF WebSocket Client is Connected to a WebSocket Server (does not matter where - in my case a IIS10 webserver with WebSockets) and the server is restarted, stopped or otherwise no longer available, the NF based ESP chip completely HALTs! There is no other recovery then manually rebooting the ESP32. This is a catastrophic bug for any production deployement.

This behavior is new in firmware 1.14.xxx and worked just fine in 1.12.xxx.

In 1.12.xxx the WebSocket Client is throwing an exception that can be handled. In 1.12.xxx the ESP hangs! All parallel threads are hanging, too, the ESP is no longer doing anything.

How to reproduce

  1. Create a small websocket server on a PC
  2. Create a small websocket client on the ESP
  3. Connect the client to the server and have it POST data every 1-3 seconds.
  4. Stop the websocket server on the PC
  5. The ESP hangs! No exception is fired, all parallel threads are hanging too

Expected behaviour

an Exception should be thrown if the server cannot be reached anymore.

Screenshots

No response

Sample project or code

Setting up the client:

void SetupWS()
{
                //setup WebSocketClient
                websocketClient = new ClientWebSocket(new ClientWebSocketOptions()
                {
                    //Change the heart beat to a 30 second interval
                    KeepAliveInterval = TimeSpan.FromSeconds(30),
                    ServerTimeout = TimeSpan.FromSeconds(3)
                });

                //Handler for receiving websocket messages. 
                websocketClient.MessageReceived += WebsocketClient_MessageReceived;
                websocketClient.ConnectionClosed += WebsocketClient_ConnectionClosed;
                //Setup custom header
                var headers = new ClientWebSocketHeaders();
                headers["userId"] = "nano";

                websocketClient.Connect($"ws://{SomeValidWSServer}/", headers);
while (1)
{
  Thread.Sleep(3000);
 WSRespondToHost("ping...");
}
}
        private void WebsocketClient_ConnectionClosed(object sender, EventArgs e)
        {
//THIS IS NEVER CALLED in FW 1.14.xxx but works fine in FW 1.12.xxx
            Thread.Sleep(100);
            SetupWebSockets();
        }

        private void WebsocketClient_MessageReceived(object sender, MessageReceivedEventArgs e)
        {
            try
            {
                if (!e.Frame.IsFragmented && e.Frame.MessageType == WebSocketMessageType.Text)
                {
                    Debug.WriteLine($"-----> Recvd Bytes: {e.Frame.MessageLength}");
                }
                else
                {
                    Debug.WriteLine("Fragmented messages are not allowed");
                }
            }
            catch (Exception ee)
            {
                Debug.WriteLine($"Error in WS Message: {ee}");
            }
        }

        public void WSRespondToHost(string bytes)
        {
            if (websocketClient == null) return;
           try
           {
                websocketClient.SendString(bytes);
            }
            catch (Exception e)
             {
//THIS IS NEVER CALLED in FW 1.14.xxx but works fine in FW 1.12.xxx
                        Debug.WriteLine($"POST to host excepted:\n{e.Message}");
             }
}

Aditional information

No response

muenchris avatar Nov 12 '25 23:11 muenchris

after digging deeper it looks like this function never excepts or returns if the server is gone:

        [MethodImpl(MethodImplOptions.InternalCall)]
        public static extern int send(object socket, byte[] buf, int offset, int count, int flags, int timeout_ms);

in

    internal class NativeSocket

muenchris avatar Nov 13 '25 00:11 muenchris

This is probably a global problem with the WebSocket implementation, since I also have the same problem with the MQTT protocol and connection breaks.

ababere avatar Nov 13 '25 09:11 ababere

@ababere suggests this could be related with #1692

josesimoes avatar Nov 13 '25 10:11 josesimoes

This is probably a global problem with the WebSocket implementation, since I also have the same problem with the MQTT protocol and connection breaks.

I actually think the problem is in the NativeSockets of the firmware. This does work fine in older FW like the 1.12.xxx. Something has changed in 1.14.xxx causing this

muenchris avatar Nov 13 '25 22:11 muenchris