Preventing or Cleaning Up Dead HTTP Processes
Rene,
I'm looking for advice how how to clean up old, dead HTTP daemon tasks.
I have, in particular, one RPi 3B -- 2.4 GHz only -- that will start accumulating "dead" HTTP requests, e.g.:
15 02036240 block httpd@2036240
16 01E39580 block httpd@1e39580
This usually happens after I see this in the system log:
Oct 25 09:06:07.70 httpd: Receive failed
I've found in CHTTPDaemon::ParseRequest() where this happens, which returns HTTPUnknownError back to CHTTPDemon::Worker(), so as a "brute force" I tried to do a CTask::Terminate() after decrementing s_nInstanceCount in CHTTPDaemon::Worker() when it sees this error, but that caused more problems (I was hoping it would work its way up the stack, and clean things up). I won't waste your time as to why that didn't work, since it seems like a brute force / dirty solution.
Nonetheless, if this were Linux (or any other Unix flavor), since I have the address of the process, I would simply 'kill -9' that process. However, as you and I have discussed in the past, Circle will only terminate a Task "gently" by itself, as only the running task can call CTask::Terminate().
I should note that sometimes when I see 'httpd: Receive failed' in the log, it appears to clean itself up and that the https task will terminate on its own, sometimes after several seconds. Other times, it will not. So I suspect it has to do with something stalling in the Worker when it realizes that there is a problem (perhaps hanging on the delete m_pSocket, just a guess?).
So you could argue that network problems should be resolved first, and I believe I have made some progress by turning off Bluetooth on about 10 Raspberry Pis, since it shares the same frequency (and RPi 3Bs only support 2.4 GHz WiFi). But I would also argue that bad client requests and/or network errors should never bring down a server. Which is what happens when I have ten (10) of these, and then I have to reboot in order to get rid of the dead httpd Tasks.
Based on the logs, I estimate that about 1.2% of the requests are becoming dead, whereas 98.8% are successful.
[Yes, I could also increase the worker count and the process count, but that only prolongs the problem without solving it.]
So let me know if you have any advice on where and how I can clean up these dead https processes, or else prevent them from getting "stuck" in a block state to begin with.
Any feedback or advice will be appreciated, and I will accept "Simply put, can't do that in Circle", if that is the correct answer.
Cheers,
Dave.
That's problem and unfortunately I don't have a quick solution for it. I looked into the code and I guess the blocked tasks hang in Receive(), waiting for the client to close the connection, which is not doing this:
https://github.com/rsta2/circle/blob/master/lib/net/httpdaemon.cpp#L317
I did not find another place, where a task could block in this context, also not when deleting the CSocket instance. You could list the currently existing network connections to prove this, when you find such blocked tasks using:
#include <circle/net/transportlayer.h>
CNetSubSystem::Get()->GetTransportLayer()->ListConnections(&m_Screen);
I have to think about this, what can be done.
Thank Rene - yes, it is possible that the connection is not always being cleanly closed on the client, as the requests are coming in from asynchronous Javascript (jQuery) running in a browser (actually a kiosk application running in the Chromium browser on other Raspberry Pis with the full Raspberry Pi / Debian Linux OS). If the lifecycle of the request from the browser gets interrupted, there’s a good chance that the socket is not cleanly closed.
Thanks for providing a way to list network connections, that is helpful, and I was looking for a way to emulate something like ‘netstat -t’ which lists open TCP sockets.
Does Circle currently time-out network connections / sockets that are idle more than a pre-determined period of time? Or does Circle provide any way for another task — effectively a watchdog — to forcibly close network connections / sockets that are idle for too long? I could not readily find any network timeout values.
I realize that Circle does not provide traditional Unix Sockets — as Circle is NOT Unix, and it should not and never will be Unix — so not sure what is or could be possible.
Let me know if you think of anything that might help.
Cheers,
Dave.
On Oct 26, 2025, at 5:35 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3448268422That's problem and unfortunately I don't have a quick solution for it. I looked into the code and I guess the blocked tasks hang in Receive(), waiting for the client to close the connection, which is not doing this:
https://github.com/rsta2/circle/blob/master/lib/net/httpdaemon.cpp#L317
I did not find another place, where a task could block in this context, also not when deleting the CSocket instance. You could list the currently existing network connections to prove this, when you find such blocked tasks using:
#include <circle/net/transportlayer.h>
CNetSubSystem::Get()->GetTransportLayer()->ListConnections(&m_Screen);
I have to think about this, what can be done.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3448268422, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KRFJSCGDUGWU2LJTT33ZSIXDAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINBYGI3DQNBSGI. You are receiving this because you authored the thread.
Does Circle currently time-out network connections / sockets that are idle more than a pre-determined period of time? Or does Circle provide any way for another task — effectively a watchdog — to forcibly close network connections / sockets that are idle for too long? I could not readily find any network timeout values.
No, this is currently not implemented.
Got it - thanks!
Wasn’t sure if was there, and I just couldn’t find it. Thanks for confirming!
So when waiting for the socket to cleanly close, can the Worker task do anything else? Can I interrupt it, to tell it to check for something, perhaps to signal that it should self- CTask::Terminate()? Just thinking if there is any way to “manually” clean these up by signaling the Worker to end itself?
On Oct 26, 2025, at 11:19 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3448623959> Does Circle currently time-out network connections / sockets that are idle
more than a pre-determined period of time? Or does Circle provide any way for another task — effectively a watchdog — to forcibly close network connections / sockets that are idle for too long? I could not readily find any network timeout values.
No, this is currently not implemented.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3448623959, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KXTWUTQL5P7IYMWBJT3ZTRABAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINBYGYZDGOJVHE. You are receiving this because you authored the thread.
So when waiting for the socket to cleanly close, can the Worker task do anything else? Can I interrupt it, to tell it to check for something, perhaps to signal that it should self- CTask::Terminate()? Just thinking if there is any way to “manually” clean these up by signaling the Worker to end itself?
No, when it hangs in Receive() it can't do anything else.
I will implement a timeout for Send() and Receive() for CSocket. This should solve the problem. I will need some time for this.
OK, thanks for looking into this.
In the mean time, I’ll look into implementing:
#include <circle/net/transportlayer.h>
CNetSubSystem::Get()->GetTransportLayer()->ListConnections(&m_Screen);
To verify that it supports the theory of accumulating “broken/dead" incoming connections. I was looking for something like this to do the equivalent of a ’netstat’ command, but I couldn't find it. Now that I know it’s there, I still can’t find it in the online documentation (https://circle-rpi.readthedocs.io https://circle-rpi.readthedocs.io/). Is it documented somewhere, or am I just not able to find it, even by searching for the function name (ListConnections)?
Of course, I guy I used work for in the 90’s loved to day — anytime one of us would say that we were going to check the documentation — is that:
The Code Is the Documentation
And tell us to read the source code instead!
On Oct 26, 2025, at 5:15 PM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3448929977> So when waiting for the socket to cleanly close, can the Worker task do
anything else? Can I interrupt it, to tell it to check for something, perhaps to signal that it should self- CTask::Terminate()? Just thinking if there is any way to “manually” clean these up by signaling the Worker to end itself?
No, when it hangs in Receive() it can't do anything else.
I will implement a timeout for Send() and Receive() for CSocket. This should solve the problem. I will need some time for this.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3448929977, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KU7UQZ5S2E4ZYZYM4D3ZU2WNAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINBYHEZDSOJXG4. You are receiving this because you authored the thread.
The class CTransportLayer implements an internal layer, which is normally not directly accessed by the user. That's why it is not documented. ListConnections() is an exception from that for debugging purpose.
And it is a wonderfully great exception for debugging purpose! I feel like I have a much better idea of what is happening now.
I’ve been watching the running system for a while. As of right now, there are six (6) httpd tasks that appear dead:
OK, show task
ADDR STAT FL NAME
00 01508940 ready main 01 01723680 ready wifireader 02 01767700 ready wifitimer 03 0171B240 ready net 04 017EB7C0 sleep wpa_supplicant 05 0188EE80 sleep mqtt 06 0188F2C0 ready cluster 07 0192BBC0 ready phantom_watchnet 08 01975C40 ready phantom_anemometer 09 03985AC0 block @.*** 10 01A01500 block telnet 11 018816C0 block tftpd 12 01A86680 block httpd 13 01ACA700 sleep ntpd 14 020A3E00 run telnet_2 15 0623C400 block telnet_3 16 02A1CB40 block @.*** 17 062705C0 block telnet_4 18 0676CE40 block @.*** 19 06A054C0 block @.*** 20 074AFAC0 block @.*** 21 092BBFC0 block @.***
For these, they appear to be in a SYN-RECEIVED state:
OK, show net
PROT LOCAL ADDRESS FOREIGN ADDRESS STATE
tcp 10.0.1.206:23 10.0.1.60:52666 ESTABLISHED
tcp 10.0.1.206:80 10.0.1.203:52986 SYN-RECEIVED
tcp 10.0.1.206:23 10.0.1.60:62478 ESTABLISHED
tcp 10.0.1.206:23 10.0.1.60:52666 SYN-RECEIVED
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
udp 10.0.1.206:69 0.0.0.0:0
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 10.0.1.202:36768 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.203:54662 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.203:55692 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.202:41216 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.203:54048 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.202:41384 TIME-WAIT
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:60479 192.168.1.204:1883 ESTABLISHED
So can I assume that this means that these connections are in the TCP-SYN state, and in the process of creating a new connection?
If so, then this is even before the web server receives anything to process?
Nonetheless, hope this helps to understand the problem?
Cheers,
Dave.
On Oct 27, 2025, at 6:25 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3450549581The class CTransportLayer implements an internal layer, which is normally not directly accessed by the user. That's why it is not documented. ListConnections() is an exception from that for debugging purpose.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3450549581, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KQ2UI5ICVC4V2DOA533ZXXKVAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINJQGU2DSNJYGE. You are receiving this because you authored the thread.
So can I assume that this means that these connections are in the TCP-SYN state, and in the process of creating a new connection?
If so, then this is even before the web server receives anything to process?
Yes, but a connection should normally only remain for a short time in SYN-RECEIVED state, until the ACK from the client for the responded SYN ACK is received. Can it be, that the client disappears after sending the initial SYN?
That is possible, not programmatically or through OS error, but through network instability. Although I saw these dead https tasks only very occasionally before, it is much more common on the two (2) RPi + Circle nodes in my basement added recently (last 8 weeks or so), and most common on the one that is mounted inside a basement window sill. When I ping it from my office, two (2) floors up, I see occasional drop-outs, e.g.:
64 bytes from 10.0.1.206: icmp_seq=0 ttl=64 time=124.844 ms Request timeout for icmp_seq 1 64 bytes from 10.0.1.206: icmp_seq=2 ttl=64 time=114.295 ms 64 bytes from 10.0.1.206: icmp_seq=3 ttl=64 time=112.612 ms 64 bytes from 10.0.1.206: icmp_seq=4 ttl=64 time=13.620 ms 64 bytes from 10.0.1.206: icmp_seq=5 ttl=64 time=11.558 ms 64 bytes from 10.0.1.206: icmp_seq=6 ttl=64 time=11.006 ms 64 bytes from 10.0.1.206: icmp_seq=7 ttl=64 time=12.628 ms 64 bytes from 10.0.1.206: icmp_seq=8 ttl=64 time=11.531 ms 64 bytes from 10.0.1.206: icmp_seq=9 ttl=64 time=120.057 ms 64 bytes from 10.0.1.206: icmp_seq=10 ttl=64 time=10.180 ms 64 bytes from 10.0.1.206: icmp_seq=11 ttl=64 time=10.971 ms Request timeout for icmp_seq 12 64 bytes from 10.0.1.206: icmp_seq=13 ttl=64 time=115.246 ms 64 bytes from 10.0.1.206: icmp_seq=14 ttl=64 time=8.961 ms 64 bytes from 10.0.1.206: icmp_seq=15 ttl=64 time=22.842 ms 64 bytes from 10.0.1.206: icmp_seq=16 ttl=64 time=10.674 ms 64 bytes from 10.0.1.206: icmp_seq=17 ttl=64 time=13.729 ms Request timeout for icmp_seq 18 64 bytes from 10.0.1.206: icmp_seq=19 ttl=64 time=113.737 ms 64 bytes from 10.0.1.206: icmp_seq=20 ttl=64 time=9.849 ms 64 bytes from 10.0.1.206: icmp_seq=21 ttl=64 time=9.663 ms 64 bytes from 10.0.1.206: icmp_seq=22 ttl=64 time=10.815 ms Request timeout for icmp_seq 23 64 bytes from 10.0.1.206: icmp_seq=24 ttl=64 time=17.025 ms 64 bytes from 10.0.1.206: icmp_seq=25 ttl=64 time=9.908 ms 64 bytes from 10.0.1.206: icmp_seq=26 ttl=64 time=10.461 ms 64 bytes from 10.0.1.206: icmp_seq=27 ttl=64 time=14.196 ms 64 bytes from 10.0.1.206: icmp_seq=28 ttl=64 time=11.015 ms 64 bytes from 10.0.1.206: icmp_seq=29 ttl=64 time=18.896 ms Request timeout for icmp_seq 30 64 bytes from 10.0.1.206: icmp_seq=31 ttl=64 time=110.562 ms 64 bytes from 10.0.1.206: icmp_seq=32 ttl=64 time=14.423 ms 64 bytes from 10.0.1.206: icmp_seq=33 ttl=64 time=23.823 ms 64 bytes from 10.0.1.206: icmp_seq=34 ttl=64 time=12.215 ms 64 bytes from 10.0.1.206: icmp_seq=35 ttl=64 time=20.308 ms 64 bytes from 10.0.1.206: icmp_seq=36 ttl=64 time=8.617 ms 64 bytes from 10.0.1.206: icmp_seq=37 ttl=64 time=11.530 ms Request timeout for icmp_seq 38 64 bytes from 10.0.1.206: icmp_seq=39 ttl=64 time=117.882 ms 64 bytes from 10.0.1.206: icmp_seq=40 ttl=64 time=15.993 ms 64 bytes from 10.0.1.206: icmp_seq=41 ttl=64 time=10.268 ms 64 bytes from 10.0.1.206: icmp_seq=42 ttl=64 time=24.758 ms Request timeout for icmp_seq 43 Request timeout for icmp_seq 44 64 bytes from 10.0.1.206: icmp_seq=45 ttl=64 time=156.620 ms 64 bytes from 10.0.1.206: icmp_seq=46 ttl=64 time=19.037 ms 64 bytes from 10.0.1.206: icmp_seq=47 ttl=64 time=11.038 ms 64 bytes from 10.0.1.206: icmp_seq=48 ttl=64 time=16.974 ms 64 bytes from 10.0.1.206: icmp_seq=49 ttl=64 time=118.155 ms
So 8 out of 50 packets, or 16%. Of course some ping packet drop-outs are expected because the RPi is busy doing something every minute and every 5 minutes, but much less dropped ping packets for my other 5 RPis running the same software / kernel / configuration, which tend to be < 5% drops.
Remember: One of the reasons for me using Circle is the ‘Cooperative Scheduling’, so I can use devices like DHT11/22 sensors that don’t like being interrupted in the middle of a bit stream (as they do not use a clock signal from the RPi).
But the bigger issue is location, and while being on a WiFi mesh, that mesh node is inside a steel cage, albeit open top and very wide fencing, and this particular RPi being the furthest from any of the mesh nodes, and siting on a cement window ledge.
So why not move it? Because it is wired to outdoor sensors.
That said, I have never had an issue apart from some occasional slight sluggishness connected in with a tenet client, and hitting the web server with a URL pasted into a desktop web browser (Mac Safari) always works. So when doing stuff manually I am not seeing any failed connections.
So does this evidence suggest different thinking that what you thought before? I guess I’m not clear on how the web server is seeing “Receive failed” messages, if the connections are failing in TCP-SYN?
I think the ’standard’ is to try an aborted sync 3 times before giving up? I read that somewhere, but not sure how common that is?
[You may remember me saying that I upgraded from a 7 year old NETGEAR Orbi mesh to a current tri-band TP-Link Deco mesh to solve some WiFi problems, and it has, as before that upgrade, the RPi on the basement window sill, which back then was running Linux, would completely drop off the network and not respond to any ICMP, HTTP, or TELNET requests until it was rebooted, sometimes a few times.]
Nonetheless, overall I am less concerned about my one specific problem — one specific RPi in one specific location — than the overall robustness of network connections in Circle, which Is why I have brought this to your attention.
I was in IT and Tech Support most of my professional career, and I know that there are always other ways to solve technical problems. Therefore, I have ordered at 35 foot CAT6 Ethernet cable to optionally hard wire this particular RPi to the WiFi mesh node, as a “Plan B”, which hopefully should circumvent, but not solve, the problem. However, it is not always possible to pull an Ethernet cable (e.g., wouldn’t want to across my dining room or living room), and most of my IoT nodes are RPi Zero 2Ws (small, low power, low heat, and cheap, but no Ethernet port).
So in other words, I’m concerned about seeing this problem repeated in the future on other projects in other locations. So the question is can (or perhaps even should?) Circle be more robust in less than optimal network conditions? If so, I have a great test environment for that!
Hopeful that makes sense? Thanks again for pointing out the ListConnections() function, as I no longer feel as blind when trying to understand and manage network connections. As I may have said before, one of my favorite quotes is: The Network Is The Computer
Cheers,
Dave.
On Oct 27, 2025, at 11:45 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3452002437 So can I assume that this means that these connections are in the TCP-SYN state, and in the process of creating a new connection?
If so, then this is even before the web server receives anything to process?
Yes, but a connection should normally only remain for a short time in SYN-RECEIVED state, until the ACK from the client for the responded SYN ACK is received. Can it be, that the client disappears after sending the initial SYN?
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3452002437, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KTTWHOPSXBO2FVRQP33ZYVZ5AVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINJSGAYDENBTG4. You are receiving this because you authored the thread.
Your ping delays are not very good, but I understand the reason. I think socket timeouts for Receive() and Send() will also help in this (remaining in SYN-RECEIVED) case. I will implement them now.
The "Receive failed" message sometimes occurs at my site, when my router tests, if a client has an open TCP port 80 (http). It simply opens a connection, but does not send anything before closing it again. Normally this can be ignored.
Circle retries 5 times, were retries are required in the TCP module.
OK, since I never got the stdlib/mbedtls issue resolved, I am still on Circle 49. So if you do post timeout fixes to the development branch, can you indicate which files have changed (assuming that they are backwards compatible with Circle 49)?
Also, is there any way to use Github to clone an older release (e.g., 49 vs. 50)? Not that I plan to make a habit out of it!
On Oct 28, 2025, at 7:50 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3456072982 Your ping delays are not very good, but I understand the reason. I think socket timeouts for Receive() and Send() will also help in this (remaining in SYN-RECEIVED) case. I will implement them now.
The "Receive failed" message sometimes occurs at my site, when my router tests, if a client has an open TCP port 80 (http). It simply opens a connection, but does not send anything before closing it again. Normally this can be ignored.
Circle retries 5 times, were retries are required in the TCP module.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3456072982, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KTRQ77GW5U3SHWCLBT3Z5J7RAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINJWGA3TEOJYGI. You are receiving this because you authored the thread.
Ah, never mind on the last point (cloning older Circle releases). I figured it out.
What confused me was the 6 or 7 character SHA codes displayed in the commit history. I know how to set the HEAD to a particular commit, but had no idea what to do with such a short code.
Hiding in place sight is the ‘copy doc’ icon next to each commit, which for some reason I never thought meant copy the full SHA-256 to my clipboard.
On Oct 28, 2025, at 11:46 AM, David Filip @.***> wrote:
OK, since I never got the stdlib/mbedtls issue resolved, I am still on Circle 49. So if you do post timeout fixes to the development branch, can you indicate which files have changed (assuming that they are backwards compatible with Circle 49)?
Also, is there any way to use Github to clone an older release (e.g., 49 vs. 50)? Not that I plan to make a habit out of it!
On Oct 28, 2025, at 7:50 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3456072982 Your ping delays are not very good, but I understand the reason. I think socket timeouts for Receive() and Send() will also help in this (remaining in SYN-RECEIVED) case. I will implement them now.
The "Receive failed" message sometimes occurs at my site, when my router tests, if a client has an open TCP port 80 (http). It simply opens a connection, but does not send anything before closing it again. Normally this can be ignored.
Circle retries 5 times, were retries are required in the TCP module.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3456072982, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KTRQ77GW5U3SHWCLBT3Z5J7RAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINJWGA3TEOJYGI. You are receiving this because you authored the thread.
While implementing the socket timeouts I found a bug in the TCP module, which might have caused the problem, why you have opened this issue. When an error occurred in a connection and the socket is closed afterwards by deleting it, the connection was not shutdown properly. I will come back, when I have fixed the problem.
BTW. In a git command a given commit SHA can have any length, as long it is unique in the git repository. Normally the first 8 characters are sufficient.
The issue should be fixed on the develop branch and socket timeouts for receive and send have been added. This is used in the sample/21-webserver now. The class CHTTPDaemon has been updated for this purpose too.
If you want to apply the fix to Circle 49 (without socket timeouts), the commit 09ffd0b should be enough.
Thanks for reporting this issue!
Thanks Rene, I’ll apply this fix to tcpconnection.cpp and test it out today.
If I wanted to back-port the socket timeouts for receive and send to Circle 49, is there a file(s) that I can copy from develop? Although I acknowledge that in general it is difficult for you to support older releases beyond a certain point.
On Oct 29, 2025, at 7:06 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3460952910 The issue should be fixed on the develop branch and socket timeouts for receive and send have been added. This is used in the sample/21-webserver now. The class CHTTPDaemon has been updated for this purpose too.
If you want to apply the fix to Circle 49 (without socket timeouts), the commit 09ffd0b https://github.com/rsta2/circle/commit/09ffd0b05dc3f77552e7547f95bf2a37c74be75e should be enough.
Thanks for reporting this issue!
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3460952910, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KRYHR3Y3IMDYHQDC5332CNRXAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRQHE2TEOJRGA. You are receiving this because you authored the thread.
For the socket timeouts and the HTTP server the changes are in the following files:
include/circle/net/httpdaemon.h | 6 ++++--
include/circle/net/netconnection.h | 3 +++
include/circle/net/netsocket.h | 12 ++++++++++++
include/circle/net/socket.h | 12 ++++++++++++
include/circle/net/tcpconnection.h | 6 ++++++
include/circle/net/tcprejector.h | 2 ++
include/circle/net/transportlayer.h | 3 +++
include/circle/net/udpconnection.h | 4 ++++
lib/net/httpdaemon.cpp | 8 ++++++--
lib/net/socket.cpp | 22 ++++++++++++++++++++++
lib/net/tcpconnection.cpp | 53 ++++++++++++++++++++++++++++++++++++++++++++---------
lib/net/transportlayer.cpp | 24 ++++++++++++++++++++++++
lib/net/udpconnection.cpp | 40 ++++++++++++++++++++++++++++++++++++++--
But unfortunately you cannot simply copy them to Circle 49, because there were already changes in Circle 50 in some of these files, which have dependencies in other files. I think you should try it with the mentioned commit only to fix the initial problem.
Got it, thanks!
I’ve have spent try trying to figure out the MBEDTLS problem, but so far have been unsuccessful. It has something to do with an entropy source. But I do want to get to Circle 50 when I can figure it out.
On Oct 29, 2025, at 7:50 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3461117804 For the socket timeouts and the HTTP server the changes are in the following files:
include/circle/net/httpdaemon.h | 6 ++++-- include/circle/net/netconnection.h | 3 +++ include/circle/net/netsocket.h | 12 ++++++++++++ include/circle/net/socket.h | 12 ++++++++++++ include/circle/net/tcpconnection.h | 6 ++++++ include/circle/net/tcprejector.h | 2 ++ include/circle/net/transportlayer.h | 3 +++ include/circle/net/udpconnection.h | 4 ++++ lib/net/httpdaemon.cpp | 8 ++++++-- lib/net/socket.cpp | 22 ++++++++++++++++++++++ lib/net/tcpconnection.cpp | 53 ++++++++++++++++++++++++++++++++++++++++++++--------- lib/net/transportlayer.cpp | 24 ++++++++++++++++++++++++ lib/net/udpconnection.cpp | 40 ++++++++++++++++++++++++++++++++++++++-- But unfortunately you cannot simply copy them to Circle 49, because there were already changes in Circle 50 in some of these files, which have dependencies in other files. I think you should try it with the mentioned commit only to fix the initial problem.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3461117804, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KQKKITZ7WUL7SQC4SL32CSXFAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRRGEYTOOBQGQ. You are receiving this because you authored the thread.
There is a new class CThreadingModule, which must be added as a member variable to CKernel with Mbed TLS 2.10. Maybe this is the problem. Please see this sample.
Interesting, I was focusing on the .cpp and not the header. Usually, if something is missing from the header, then it throws a compiler error. But I guess this code just has lots of conditional compiles that gets around missing things.
So I had not been defining either of these in my code:
CircleMbedTLS::CThreadingModule m_ThreadingModule;
CircleMbedTLS::CEntropyModule m_EntropyModule;
which was originally based on Stefan’s circle-stdlib/samples/mbedtls/01-https-client1/ssl_client.c. I did this whenever Circle 46 (I think) was the most current, so maybe about a year ago (?).
Although it worked fine from Circle 46, 47, 48 through 49, I see the first (CircleMbedTLS::CEntropyModule m_EntropyModule) listed in the example code for the version of circle-stdlib synchronized with Circle 49.
So I now tried adding BOTH to my code, and it still complies cleanly, but when I try to open an HTTPS connection, I now get a crash in CMutex::Acquire(). If I include just the entropy (CircleMbedTLS::CEntropyModule) without the threading (CircleMbedTLS::CThreadingModule), I get still the error about not being able to send the random number generator, but it doesn’t crash.
On Oct 29, 2025, at 8:05 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3461171836 There is a new class CThreadingModule, which must be added as a member variable to CKernel with Mbed TLS 2.10. Maybe this is the problem. Please see this https://github.com/smuehlst/circle-stdlib/blob/master/samples/mbedtls/01-https-client1/kernel.h sample.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3461171836, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KQD7FG26IATBNKHZSL32CUSBAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRRGE3TCOBTGY. You are receiving this because you authored the thread.
OK, it doesn't work. I could make assumptions, why it crashes, but in the end it can become a longer story, which does not belong into this repo, because it's a circle-mbedtls thing. Please report it in the circle-stdlib repository, if you need further help on this.
Absolutely. Sorry I mentioned it, but you made a suggestion, so I wanted to answer that suggestion.
Nonetheless I will not mention it on this thread again.
On Oct 29, 2025, at 6:12 PM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3464399615OK, it doesn't work. I could make assumptions, why it crashes, but in the end it can become a longer story, which does not belong into this repo, because it's a circle-mbedtls thing. Please report it in the circle-stdlib repository, if you need further help on this.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3464399615, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KVTABJQQM4PAP4JYTL32E3TVAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRUGM4TSNRRGU. You are receiving this because you authored the thread.
No need to say sorry. I thought, it would quickly help, but I was wrong.
To be clear, the fix is simply removing these lines from tcpconnection.cpp:
322c322,326 <
if (m_nErrno < 0) { return m_nErrno; }
so that I falls through to the switch statement even if m_nErrno is negative. Is that correct?
If so, it might be a partial success, as before I was seeing about 1+ TCP-SYNC per hour, so I would hit 10 dead httpds in a few hours. However, with this patch, assuming all other things being equal, it took almost 22 hours to accumulate 10 TCP-SYNCs:
ADDR STAT FL NAME
00 01508940 ready main 01 01723680 ready wifireader 02 01767700 ready wifitimer 03 0171B240 ready net 04 017EB7C0 sleep wpa_supplicant 05 018A4400 sleep mqtt 06 018A5240 ready cluster 07 01942940 ready phantom_watchnet 08 0198C9C0 ready phantom_anemometer 09 0B035DC0 block @.*** 10 01A18280 block telnet 11 01882700 block tftpd 12 01A9D400 block httpd 13 01AE1480 sleep ntpd 14 0170A180 run telnet_2 15 026D4940 block @.*** 16 020E3440 block @.*** 17 027C02C0 block @.*** 18 0477E400 block @.*** 19 096F1500 block @.*** 20 0307B980 block @.*** 21 0B98F440 block @.*** 22 0D661E80 block @.*** 23 0E6B74C0 block @.***
PROT LOCAL ADDRESS FOREIGN ADDRESS STATE
tcp 10.0.1.206:80 10.0.1.202:42816 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.202:44124 TIME-WAIT
tcp 10.0.1.206:80 10.0.1.201:45666 SYN-RECEIVED
tcp 10.0.1.206:23 10.0.1.60:53347 ESTABLISHED
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
udp 10.0.1.206:69 0.0.0.0:0
tcp 10.0.1.206:80 10.0.1.201:46082 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.201:55202 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.201:44250 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.202:42816 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.201:43132 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.202:42948 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.203:40942 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.202:39046 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.202:42948 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.202:43182 SYN-RECEIVED
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 10.0.1.202:43144 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.203:42234 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.201:49914 FIN-WAIT-1
tcp 10.0.1.206:80 10.0.1.201:49912 TIME-WAIT
tcp 10.0.1.206:80 10.0.1.203:42256 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.203:42988 TIME-WAIT
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:60971 192.168.1.204:1883 ESTABLISHED
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
tcp 10.0.1.206:80 0.0.0.0:0 LISTEN
Just a thought - because I am seeing more than ten (10) SYNC-RECEIVED’s after the connection limit of ten (10) is reached in the server. So is it possible that these are being caused by the web server being too busy / unable to accept the connection? So what happens — or should happen — when the web server does not accept the connection, and how should that clean itself up at the network connection level? I am assuming that we should not just keep accumulating SYN-RECEIVED connections?
Full disclosure, I did code-fork CWebServer to my own code, which I have not focused on because the problem appears to be at a much lower level in the protocol stack, but I am attaching my web server code in case you find me doing anything egregious in it. Although the same web server is running 24x7 on 5 other Circle-based nodes without this problem, with much better WiFi connections, except on rare occasions (maybe sometimes 1 or 2 dead httpds after a week or more on one of the other nodes?).
Cheers,
Dave.

On Oct 29, 2025, at 7:06 AM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3460952910 The issue should be fixed on the develop branch and socket timeouts for receive and send have been added. This is used in the sample/21-webserver now. The class CHTTPDaemon has been updated for this purpose too.
If you want to apply the fix to Circle 49 (without socket timeouts), the commit 09ffd0b https://github.com/rsta2/circle/commit/09ffd0b05dc3f77552e7547f95bf2a37c74be75e should be enough.
Thanks for reporting this issue!
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3460952910, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KRYHR3Y3IMDYHQDC5332CNRXAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRQHE2TEOJRGA. You are receiving this because you authored the thread.
It's likely, that you need the complete fix including the socket timeouts to solve this problem. If this doesn't help, we could look further.
Understood … might be a while before I can upgrade from Circle 49 to Circle 50 … so keep this issue open until then, or should I close it now? Please advise on how you want me to proceed.
On Oct 30, 2025, at 12:42 PM, Rene Stange @.***> wrote:
rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3468976807It's likely, that you need the complete fix including the socket timeouts to solve this problem. If this doesn't help, we could look further.
— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3468976807, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KS6L3LVHJNYZMU2ZEL32I5W3AVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRYHE3TMOBQG4. You are receiving this because you authored the thread.
Please keep it open, until it is confirmed that the initial problem is fixed.
Please note, that you have to activate the socket receive timeout, if you are using your own version of the web server. Otherwise it will not be active. There is a new parameter for CHTTPDaemon::CHTTPDaemon() here, which must be greater than zero. It sets the timeout here. You can test this using telnet ip_address 80. The Telnet connection should be terminated automatically after the specified amount of time. But of course you have to update to the version from the develop branch first.
Unfortunately, the problem goes beyond bad network connectivity, as I've connected with an Ethernet cable directly to an IPV6 Mesh node, and while I now have much better network connectivity:
64 bytes from 10.0.1.206: icmp_seq=420 ttl=64 time=6.233 ms
64 bytes from 10.0.1.206: icmp_seq=421 ttl=64 time=6.853 ms
64 bytes from 10.0.1.206: icmp_seq=422 ttl=64 time=5.633 ms
64 bytes from 10.0.1.206: icmp_seq=423 ttl=64 time=9.308 ms
64 bytes from 10.0.1.206: icmp_seq=424 ttl=64 time=54.515 ms
64 bytes from 10.0.1.206: icmp_seq=425 ttl=64 time=98.863 ms
64 bytes from 10.0.1.206: icmp_seq=426 ttl=64 time=43.830 ms
64 bytes from 10.0.1.206: icmp_seq=427 ttl=64 time=61.904 ms
64 bytes from 10.0.1.206: icmp_seq=428 ttl=64 time=6.546 ms
64 bytes from 10.0.1.206: icmp_seq=429 ttl=64 time=6.120 ms
64 bytes from 10.0.1.206: icmp_seq=430 ttl=64 time=9.924 ms
64 bytes from 10.0.1.206: icmp_seq=431 ttl=64 time=23.986 ms
64 bytes from 10.0.1.206: icmp_seq=432 ttl=64 time=40.363 ms
64 bytes from 10.0.1.206: icmp_seq=433 ttl=64 time=132.691 ms
64 bytes from 10.0.1.206: icmp_seq=434 ttl=64 time=20.766 ms
64 bytes from 10.0.1.206: icmp_seq=435 ttl=64 time=28.474 ms
64 bytes from 10.0.1.206: icmp_seq=436 ttl=64 time=66.904 ms
64 bytes from 10.0.1.206: icmp_seq=437 ttl=64 time=107.268 ms
64 bytes from 10.0.1.206: icmp_seq=438 ttl=64 time=154.969 ms
64 bytes from 10.0.1.206: icmp_seq=439 ttl=64 time=7.852 ms
64 bytes from 10.0.1.206: icmp_seq=440 ttl=64 time=6.260 ms
with no drop-outs, I am still getting dead http processes stuck in SYNC-RECEIVED:
12 01DBB680 block httpd@1dbb680
13 01E4AA80 block httpd@1e4aa80
14 01F31180 run telnet_2
16 01FEEE80 block httpd@1feee80
17 022CEC00 block httpd@22cec00
18 01F828C0 block httpd@1f828c0
PROT LOCAL ADDRESS FOREIGN ADDRESS STATE
tcp 10.0.1.206:80 10.0.1.202:52456 SYN-RECEIVED
tcp 10.0.1.206:60001 192.168.1.204:1883 ESTABLISHED
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
tcp 10.0.1.206:7 0.0.0.0:0 LISTEN
tcp 10.0.1.206:7 0.0.0.0:0 LISTEN
tcp 10.0.1.206:7 0.0.0.0:0 LISTEN
tcp 10.0.1.206:7 0.0.0.0:0 LISTEN
tcp 10.0.1.206:23 10.0.1.60:58061 ESTABLISHED
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN
udp 10.0.1.206:69 0.0.0.0:0
tcp 10.0.1.206:80 10.0.1.202:52456 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.202:52708 SYN-RECEIVED
tcp 10.0.1.206:80 10.0.1.201:55958 TIME-WAIT
tcp 10.0.1.206:80 10.0.1.202:52708 SYN-RECEIVED
I am now running Circle 50.0.1 (not develop). Can you please advise as to what I need to patch to get the socket time-out code? Unfortunately, my build is based on circle-stdlib, which auto-downloads Circle 50.0.1, and I realize that this is NOT the place to discuss circle-stdlib issues! However, can you advise as to what I need to copy/patch from the develop branch to get the timeouts that you have implemented into 50.0.1? I am hoping that is easy to do?
Cheers,
Dave.
Yes, the socket timeouts are currently only on the develop branch available. I tried to extract a patch, but this is not that simple because of changes in many files for the socket timeouts. If you do not want to checkout develop directly you could checkout commit 6247b896. There are only changes up to this commit, which are necessary for the socket timeouts or which are relatively safe to not destroy something:
$ cd libs/circle
$ git checkout 6247b896
I guess you are using the class CHTTPDaemon for the port 80 servers? Then you only have to add the timeout value (in number of seconds) as last parameter in the constructor of that class to enable the timeout here:
CHTTPDaemon (pNetSubSystem, pSocket, MAX_CONTENT_SIZE, HTTP_PORT, 0, TIMEOUT_SECONDS)
Please see sample/21-webserver. I think a value of 10 (seconds) should be OK.