Home icon indicating copy to clipboard operation
Home copied to clipboard

Wifi Access Point Mode - HttpListener hangs

Open Alex-111 opened this issue 2 years ago • 36 comments

Target name(s)

ESP32-S3 DevkitC-1

Firmware version

latest - 1.8.1.370 ESP32-S3

Was working before? On which version?

No response

Device capabilities

No response

Description

When setting up a SoftAP the HttpListener sometimes does not accept any new requests.

How to reproduce

I started with the provided sample code "WifiAP" and would like to setup a simple Wifi Access Point with a very basic webserver. I tested with my Android phone to connect to the SSID and via Browser I requested http://192.168.4.1. The first request seems to work...

But especialy, when I connect my Smartphone to another SSID and then return back to my nanoFramework AP, no requests are accepted anymore and the browser just hangs.

It just seems that the socket listener just does not return anymore.

Here is my smaple code: https://github.com/Alex-111/WiFiAPTest/tree/master

Expected behaviour

I would expect that HttpListener always accepts webrequests,regardless if I connect my Smartphone to another WIFI and then later connect it again to the SoftAP.

Screenshots

No response

Aditional information

No response

Alex-111 avatar Jul 28 '23 13:07 Alex-111

@Ellerbach wondering if this is somewhat related (or similar) with the fix you've made the other day on the webserver...

josesimoes avatar Aug 03 '23 16:08 josesimoes

I tested the code and it works as expected for me: First request, I open a browser and went to the 192.168.4.1 page. Then I connected to another SSDI Then I connected back to the MySsid Then went again to the page:

image

Ellerbach avatar Aug 04 '23 09:08 Ellerbach

I've been repeating multiple times with different processes (closing the browser before leaving, leaving it open, refresh, etc), it always worked as expected. So closing this issue. This may be due to the browser, phone specific.

Ellerbach avatar Aug 04 '23 09:08 Ellerbach

@Ellerbach Thanks for testing!

Please could you tell me more about your setup: What firmware do you use? What device do you use? Maybe it is specific to a special device/firmware combination?

Alex-111 avatar Aug 04 '23 09:08 Alex-111

I have the same issue as @Alex-111 and I am using Android Phone.

alberk8 avatar Aug 04 '23 09:08 alberk8

@josesimoes As more peaple have this issue I think we should investigate a little bit more before closing the issue?

Alex-111 avatar Aug 04 '23 09:08 Alex-111

@Alex-111 : @Ellerbach owns this issue, up to him. 😉

josesimoes avatar Aug 04 '23 11:08 josesimoes

Firmware: ESP32_REV0-1.8.1.419 Device: ESP32 (a basic one) Phone: iPhone

So let me reopen the issue, I'll try with other devices then.

Ellerbach avatar Aug 04 '23 11:08 Ellerbach

I've tried this time with ESP32-S3 Firmware: ESP32_S3-1.8.1.375 Phone: iPhone

Still works as expected!

Ellerbach avatar Aug 04 '23 11:08 Ellerbach

Just tried with an Android phone (Samsung) and it also works as expected. I tries with the ESP32-S3. Same scenario, connection to the SSID, confirmation that I want to use the network without internet, connecting to the 192.168.4.1, getting the page. Connecting to another SSID, doing something, connecting back to the MySsid, and same, confirming I want to use without network, going to 192.168.4.1, page loads perfectly.

So I'm really not sure what's happening with both @alberk8 and @Alex-111 but I cannot reproduce your problem with ESP32, ESP32-S3, iPhone and Android!

Ellerbach avatar Aug 04 '23 11:08 Ellerbach

Are you closing and opening the web page again?

To replicate

  1. Connect to nf AP
  2. Open browser to http://192.168.4.1 (web page loads)
  3. Change to another AP and wait for a few seconds
  4. Change back to nf AP.
  5. Go back to the page in step 2 and refresh. On Android I just swipe down.
  6. The page will be loading.........

alberk8 avatar Aug 05 '23 07:08 alberk8

Are you closing and opening the web page again?

To replicate

  1. Connect to nf AP
  2. Open browser to http://192.168.4.1 (web page loads)
  3. Change to another AP and wait for a few seconds
  4. Change back to nf AP.
  5. Go back to the page in step 2 and refresh. On Android I just swipe down.
  6. The page will be loading.........

Especially take care at step 5 sometimes the pages appears as expected because of the browser cache but you still see the loading indicator, i.e. the browser cannot get data... Also on the debug output there is no request visible anymore. Maybe it is also related to the hardware configuration. I think @alberk8 and I are using a device without PSRAM -> (ESP32-S3-DevkitC-1 in my case)

Alex-111 avatar Aug 05 '23 10:08 Alex-111

Additional Context. If I wait long enough like 5 minutes there is an error. A new listener is created then the page refresh without issue. The same thing also happen when I run the app in ESP32 or ESP32_S3, with or without PSRAM.

listener.GetContext()
Get Context 1, this is next line after the _listener.GetContext()
    ++++ Exception System.Net.Sockets.SocketException - 0x00000000 (4) ++++
    ++++ Message:
    ++++ System.Net.InputNetworkStreamWrapper::Read_HTTP_Line [IP: 015a] ++++
    ++++ System.Net.HttpListenerRequest::ParseHTTPRequest [IP: 000d] ++++
    ++++ System.Net.HttpListenerContext::get_Request [IP: 000d] ++++
    ++++ WifiAP.WebServerSimple::RunServer [IP: 0031] ++++
Request:
Process Request Ends
    ++++ Exception System.Net.Sockets.SocketException - CLR_E_FAIL (4) ++++
    ++++ Message:
    ++++ System.Net.Sockets.NativeSocket::send [IP: 0000] ++++
    ++++ System.Net.Sockets.Socket::Send [IP: 0018] ++++
    ++++ System.Net.Sockets.NetworkStream::Write [IP: 0051] ++++
    ++++ System.Net.HttpListenerResponse::SendHeaders [IP: 003f] ++++
    ++++ System.Net.HttpListenerResponse::Close [IP: 0010] ++++
    ++++ WifiAP.WebServerSimple::RunServer [IP: 0031] ++++
System.Net.Sockets.SocketException: Exception was thrown: System.Net.Sockets.SocketException

alberk8 avatar Aug 06 '23 03:08 alberk8

Are you closing and opening the web page again?

I did with various variation:

  • closing the page and reopening
  • coming back and just refresh
  • closing the page, the browser, coming back reopening a page

All worked as expected! The ESP32 device I'm using do not have PSRAM, it's the very basic one, the ESP32-S3 is a DevKit-M. Works fine with Edge as a browser on both iPhone and Android! So I'm sorry but I really can't reproduce this :-( That would make things much easier!

Ellerbach avatar Aug 06 '23 09:08 Ellerbach

Yes. It is very strange, that it works without issues on your side, but I've exactly the same siuation as @alberk8 So let's think again what is the difference?

My setup: image

The packages I use: <Reference Include="Iot.Device.DhcpServer, Version=1.2.0.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.Iot.Device.DhcpServer.1.2.300\lib\Iot.Device.DhcpServer.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="mscorlib, Version=1.14.3.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.CoreLibrary.1.14.2\lib\mscorlib.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="nanoFramework.ResourceManager, Version=1.2.13.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.ResourceManager.1.2.13\lib\nanoFramework.ResourceManager.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="nanoFramework.Runtime.Events, Version=1.11.6.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.Runtime.Events.1.11.6\lib\nanoFramework.Runtime.Events.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="nanoFramework.Runtime.Native, Version=1.6.6.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.Runtime.Native.1.6.6\lib\nanoFramework.Runtime.Native.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="nanoFramework.System.Collections, Version=1.5.18.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.System.Collections.1.5.18\lib\nanoFramework.System.Collections.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="nanoFramework.System.Text, Version=1.2.37.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.System.Text.1.2.37\lib\nanoFramework.System.Text.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="System.Device.Gpio, Version=1.1.28.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.System.Device.Gpio.1.1.28\lib\System.Device.Gpio.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="System.Device.Wifi, Version=1.5.54.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.System.Device.Wifi.1.5.54\lib\System.Device.Wifi.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="System.IO.Streams, Version=1.1.38.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.System.IO.Streams.1.1.38\lib\System.IO.Streams.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="System.Net, Version=1.10.52.0, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.System.Net.1.10.52\lib\System.Net.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="System.Net.Http"> <HintPath>packages\nanoFramework.System.Net.Http.Server.1.5.97\lib\System.Net.Http.dll</HintPath> </Reference> <Reference Include="System.Net.Sockets.TcpClient"> <HintPath>packages\nanoframework.System.Net.Sockets.TcpClient.1.1.52\lib\System.Net.Sockets.TcpClient.dll</HintPath> </Reference> <Reference Include="System.Threading, Version=1.1.19.33722, Culture=neutral, PublicKeyToken=c07d481e9758c731"> <HintPath>packages\nanoFramework.System.Threading.1.1.19\lib\System.Threading.dll</HintPath> <Private>True</Private> </Reference> <Reference Include="Windows.Storage"> <HintPath>packages\nanoFramework.Windows.Storage.1.5.33\lib\Windows.Storage.dll</HintPath> </Reference> <Reference Include="Windows.Storage.Streams"> <HintPath>packages\nanoFramework.Windows.Storage.Streams.1.14.24\lib\Windows.Storage.Streams.dll</HintPath> </Reference>

Same situation in debugger or without debugger attached...

@Ellerbach Any idea what else we could check?

Alex-111 avatar Aug 06 '23 10:08 Alex-111

@Ellerbach @alberk8 I've done some new tests and want to share my observations:

  • With an old iPhone it works almost every time. It is very difficult to replicate the issue there
  • I also changed the code to allow two concurrent connections and then connected with Android and iPhone parallel. After that on the Android it is the same as before -> hanging, but it seems that on the iPhone it hangs much more often. It seems that it also hangs, when the Android disconnects and the iPhone then gets the "faulty session"
  • The most interesting thing: Both Android and iPhone are connected to SSID -> If the iPhone hangs you can immediatly stop that hanging if you do a request on the Android phone. It seems the second connection triggers the "AutoResetEvent" in the HttpListener and then both requests are processed...

Unfortunately I still do not know what exactly causes the hanging. But maybe some of you can investigate the native code, For me it really looks like the Socket Accept does not return.

Any idea what happens in this socket code, if there are two requests in parallel? Is it ensured that no request is lost?

image

Alex-111 avatar Aug 07 '23 06:08 Alex-111

Any idea what happens in this socket code, if there are two requests in parallel? Is it ensured that no request is lost?

The sample is done in a very simple way, not ment to scale. Use the "real" WebServer nuget to get all working with multiple parallel requests at the same time. Now, that comes with the cost of size. The sample is done how to set the device where you typically have 1 and unique phone connecting 1 and unique time :-) And where you can retry but just rebooting the device.

Btw, glad you figured out a way. PR to improve the robustness of the sample is always welcome btw!

Ellerbach avatar Aug 07 '23 07:08 Ellerbach

@Ellerbach I'm aware of the drawbacks of this simple webserver but regardsless which webserver I use. The issue stays....

I also tried your webserver nuget, but when looking at the code of the full featured webserver there is no difference. Both use the HttpListener which in my opinion have the some problems in this case.... There is the same "_listener.GetContext()" which just does not return in that case... i.e. this has nothing to do with the webserver itself...

Alex-111 avatar Aug 07 '23 08:08 Alex-111

I also tried your webserver nuget, but when looking at the code of the full featured webserver there is no difference. Both use the HttpListener which in my opinion have the some problems in this case.... There is the same "_listener.GetContext()" which just does not return in that case... i.e. this has nothing to do with the webserver itself...

Let me look at this as well then. Note that on the ESP side, there are also bad behavior on the socket and it's related to Espressif, nothing we can change. Here is an example:

  • Setup an ESP as server, ask for a HTTP page where you keep the connection open
  • Do the request on the client side
  • Stop and don't close the socket on the client side
  • The ESP is blocked and you can't do another request later from the same IP address and it can fully block all the networking

And in this scenario, that's related to how things are managed on the Espressif side. totally independent of anything on the nano side unfortunately. So you'll see some side effects like this one that you cannot control. This is done differently on devices like the STM32.

Those devices are not ment to be highly scalable as web servers or sockets but rather handle one, at best few.

Ellerbach avatar Aug 07 '23 10:08 Ellerbach

@Ellerbach thanks for your answer. THis sounds really similar to the issue we have here. But isn't there a way to work around this, e.g. maybe there is a possibility to setup a timeout for the blocking, so that it does not block forever.

Imagine you have a iot-device which is able to be configured via SoftAP. If anybody connects and just goes away without closing the socket connection, then we would be forced to reset the device. THis is really not what we want...

Alex-111 avatar Aug 07 '23 10:08 Alex-111

Imagine you have a iot-device which is able to be configured via SoftAP. If anybody connects and just goes away without closing the socket connection, then we would be forced to reset the device. THis is really not what we want...

You definitely can add a timeout, that's totally possible. Still, lower level, there are some things that can break. For example, I4ve been using an ESP based device flashed with WLED (I'm using it for notifications). And if I use this device for the tests we're running here (I've tried ;-)), then it will be fully blocked. Nothing I can do except rebooting it. And it's native C, directly using the Espressif API. You can definitely add a timeout, that will help btw in your scenario. But again, those are far to be perfect! Add a watchdog, dispose everything thru a timer, things like this definitely is a good practice in all cases!

Ellerbach avatar Aug 07 '23 10:08 Ellerbach

@Ellerbach I updated my repro to try to stop the HttpListener on WIFI disconnection. Is this what you mean I should do on timeout? To dispose the HttpListener on some conditions? Or is there another timeout parameter I'm not aware of?

My sample Repro is working better with this new logic, but still there are some situations where it just blocks, even if I dispose the HttpListener and create it again after a WIFI-client connects....

If this is really the best we can get, than I would have expected a little bit more reliability... Not sure if this is something which could go beyond a hobby project in that case?

Another thought: Couldn't we open a ticket at Espressif, if this is a known issue?

Alex-111 avatar Aug 07 '23 13:08 Alex-111

Yes, you basically have to play with all this. You can also add a big try {}catch {} in the Main function with a global mechanism. If you want, you can also periodically restart the webserver. Things like this.

Another thought: Couldn't we open a ticket at Espressif, if this is a known issue?

I'm sure one is open among the 1K+ issues ;-) https://github.com/espressif/esp-idf/issues There are 57 open just with socket and some seems very similar to the problem I describe.

Ellerbach avatar Aug 07 '23 13:08 Ellerbach

IDF has been updated since the last comments. Is this still blocked?

networkfusion avatar Dec 27 '23 23:12 networkfusion

As it's been 3 months since the last feedback on this issue, I'm closing it. If the problem persists, feel free to reopen it.

Ellerbach avatar Mar 26 '24 12:03 Ellerbach

@Ellerbach @AdrianSoundy It could not be tested because of https://github.com/nanoframework/Home/issues/1488 But now with latest firmware it is still not responding after a few requests on my tests: with ESP_S3. Tested with WifiAP project from samples. It seems ticket 1488 is still not fixed to 100%. Therefore we cannot debug code with Visual Studio at the moment. Will test more, when debugging works again.

Alex-111 avatar Jun 03 '24 05:06 Alex-111

So, reopening the issue. Thanks for providing updates.

Ellerbach avatar Jun 03 '24 12:06 Ellerbach

@Ellerbach

now #1493 is fixed and I did some further tests with my S3 and the WIFIAP sample code. When it hangs it always blocks at this line and does not return from writing to the stream. To make it block I just have to refresh the webpage (with "pull to refresh") from my Android phone about 2 or 3 times. After this it completely hangs and it has to be rebooted:

image

Any ideas why this could happen? It feels like a deadlock.

Edit: I left the dubber running and so I just found out that after some minutes maybe 10 or 15 the blocking code (writing to stream) returns with: ++++ Exception System.Net.Sockets.SocketException - CLR_E_FAIL (4) ++++ ++++ Message: ++++ System.Net.Sockets.NativeSocket::send [IP: 0000] ++++ ++++ System.Net.Sockets.Socket::Send [IP: 0018] ++++ ++++ System.Net.Sockets.NetworkStream::Write [IP: 0051] ++++ ++++ System.Net.OutputNetworkStreamWrapper::Write [IP: 0022] ++++ ++++ WifiAP.WebServer::OutPutByteResponse [IP: 001d] ++++ ++++ WifiAP.WebServer::ProcessRequest [IP: 0070] ++++ ++++ WifiAP.WebServer::RunServer [IP: 003b] ++++ Exception thrown: 'System.Net.Sockets.SocketException' in System.Net.dll An unhandled exception of type 'System.Net.Sockets.SocketException' occurred in System.Net.dll

Alex-111 avatar Jun 10 '24 05:06 Alex-111

It definitely requires some investigations. And will require to instrument for debug the web server. If you are willing to, here is what I have in mind:

  • Clone the WebServer and use it locally with a simple program. Best is to add a solution like one of the sample you're working on and us the project reference
  • instrument with couple of Debug.WriteLine this part: https://github.com/nanoframework/nanoFramework.WebServer/blob/5b2c125e6706a0e9f24ed6d470f5041a4b7aa349/nanoFramework.WebServer/WebServer.cs#L520
  • Run everything and see the output and where it breaks, that will give an idea of the part of the code and then continue to go to the rabbit hole
  • Hopefully it's an issue that can be resolved on the managed side but by experience, it may not be the case
  • Try to adjust headers as well, play with those things

Ellerbach avatar Jun 11 '24 09:06 Ellerbach

@Ellerbach Meanwhile I had a look at the code and it seems to block here:

image

From my understanding this is not directly related to the webserver, but to the HttpListener.

response is if type HttpListenerResponse and in this line it is directly written to the stream, which seems to be a NetweorkStream -> Socket behind the scenes. So I fear we are here already on the native side?

Alex-111 avatar Jun 11 '24 10:06 Alex-111