circle icon indicating copy to clipboard operation
circle copied to clipboard

Preventing or Cleaning Up Dead HTTP Processes

Open davefilip opened this issue 2 months ago • 31 comments

Rene,

I'm looking for advice how how to clean up old, dead HTTP daemon tasks.

I have, in particular, one RPi 3B -- 2.4 GHz only -- that will start accumulating "dead" HTTP requests, e.g.:

15 02036240 block    httpd@2036240
16 01E39580 block    httpd@1e39580

This usually happens after I see this in the system log:

Oct 25 09:06:07.70 httpd: Receive failed

I've found in CHTTPDaemon::ParseRequest() where this happens, which returns HTTPUnknownError back to CHTTPDemon::Worker(), so as a "brute force" I tried to do a CTask::Terminate() after decrementing s_nInstanceCount in CHTTPDaemon::Worker() when it sees this error, but that caused more problems (I was hoping it would work its way up the stack, and clean things up). I won't waste your time as to why that didn't work, since it seems like a brute force / dirty solution.

Nonetheless, if this were Linux (or any other Unix flavor), since I have the address of the process, I would simply 'kill -9' that process. However, as you and I have discussed in the past, Circle will only terminate a Task "gently" by itself, as only the running task can call CTask::Terminate().

I should note that sometimes when I see 'httpd: Receive failed' in the log, it appears to clean itself up and that the https task will terminate on its own, sometimes after several seconds. Other times, it will not. So I suspect it has to do with something stalling in the Worker when it realizes that there is a problem (perhaps hanging on the delete m_pSocket, just a guess?).

So you could argue that network problems should be resolved first, and I believe I have made some progress by turning off Bluetooth on about 10 Raspberry Pis, since it shares the same frequency (and RPi 3Bs only support 2.4 GHz WiFi). But I would also argue that bad client requests and/or network errors should never bring down a server. Which is what happens when I have ten (10) of these, and then I have to reboot in order to get rid of the dead httpd Tasks.

Based on the logs, I estimate that about 1.2% of the requests are becoming dead, whereas 98.8% are successful.

[Yes, I could also increase the worker count and the process count, but that only prolongs the problem without solving it.]

So let me know if you have any advice on where and how I can clean up these dead https processes, or else prevent them from getting "stuck" in a block state to begin with.

Any feedback or advice will be appreciated, and I will accept "Simply put, can't do that in Circle", if that is the correct answer.

Cheers,

Dave.

davefilip avatar Oct 25 '25 14:10 davefilip

That's problem and unfortunately I don't have a quick solution for it. I looked into the code and I guess the blocked tasks hang in Receive(), waiting for the client to close the connection, which is not doing this:

https://github.com/rsta2/circle/blob/master/lib/net/httpdaemon.cpp#L317

I did not find another place, where a task could block in this context, also not when deleting the CSocket instance. You could list the currently existing network connections to prove this, when you find such blocked tasks using:

#include <circle/net/transportlayer.h>

CNetSubSystem::Get()->GetTransportLayer()->ListConnections(&m_Screen);

I have to think about this, what can be done.

rsta2 avatar Oct 26 '25 09:10 rsta2

Thank Rene - yes, it is possible that the connection is not always being cleanly closed on the client, as the requests are coming in from asynchronous Javascript (jQuery) running in a browser (actually a kiosk application running in the Chromium browser on other Raspberry Pis with the full Raspberry Pi / Debian Linux OS). If the lifecycle of the request from the browser gets interrupted, there’s a good chance that the socket is not cleanly closed.

Thanks for providing a way to list network connections, that is helpful, and I was looking for a way to emulate something like ‘netstat -t’ which lists open TCP sockets.

Does Circle currently time-out network connections / sockets that are idle more than a pre-determined period of time? Or does Circle provide any way for another task — effectively a watchdog — to forcibly close network connections / sockets that are idle for too long? I could not readily find any network timeout values.

I realize that Circle does not provide traditional Unix Sockets — as Circle is NOT Unix, and it should not and never will be Unix — so not sure what is or could be possible.

Let me know if you think of anything that might help.

Cheers,

Dave.

On Oct 26, 2025, at 5:35 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3448268422That's problem and unfortunately I don't have a quick solution for it. I looked into the code and I guess the blocked tasks hang in Receive(), waiting for the client to close the connection, which is not doing this:

https://github.com/rsta2/circle/blob/master/lib/net/httpdaemon.cpp#L317

I did not find another place, where a task could block in this context, also not when deleting the CSocket instance. You could list the currently existing network connections to prove this, when you find such blocked tasks using:

#include <circle/net/transportlayer.h>

CNetSubSystem::Get()->GetTransportLayer()->ListConnections(&m_Screen);

I have to think about this, what can be done.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3448268422, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KRFJSCGDUGWU2LJTT33ZSIXDAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINBYGI3DQNBSGI. You are receiving this because you authored the thread.

davefilip avatar Oct 26 '25 12:10 davefilip

Does Circle currently time-out network connections / sockets that are idle more than a pre-determined period of time? Or does Circle provide any way for another task — effectively a watchdog — to forcibly close network connections / sockets that are idle for too long? I could not readily find any network timeout values.

No, this is currently not implemented.

rsta2 avatar Oct 26 '25 15:10 rsta2

Got it - thanks!

Wasn’t sure if was there, and I just couldn’t find it. Thanks for confirming!

So when waiting for the socket to cleanly close, can the Worker task do anything else? Can I interrupt it, to tell it to check for something, perhaps to signal that it should self- CTask::Terminate()? Just thinking if there is any way to “manually” clean these up by signaling the Worker to end itself?

On Oct 26, 2025, at 11:19 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3448623959> Does Circle currently time-out network connections / sockets that are idle

more than a pre-determined period of time? Or does Circle provide any way for another task — effectively a watchdog — to forcibly close network connections / sockets that are idle for too long? I could not readily find any network timeout values.

No, this is currently not implemented.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3448623959, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KXTWUTQL5P7IYMWBJT3ZTRABAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINBYGYZDGOJVHE. You are receiving this because you authored the thread.

davefilip avatar Oct 26 '25 15:10 davefilip

So when waiting for the socket to cleanly close, can the Worker task do anything else? Can I interrupt it, to tell it to check for something, perhaps to signal that it should self- CTask::Terminate()? Just thinking if there is any way to “manually” clean these up by signaling the Worker to end itself?

No, when it hangs in Receive() it can't do anything else.

I will implement a timeout for Send() and Receive() for CSocket. This should solve the problem. I will need some time for this.

rsta2 avatar Oct 26 '25 21:10 rsta2

OK, thanks for looking into this.

In the mean time, I’ll look into implementing:

#include <circle/net/transportlayer.h>
CNetSubSystem::Get()->GetTransportLayer()->ListConnections(&m_Screen);

To verify that it supports the theory of accumulating “broken/dead" incoming connections. I was looking for something like this to do the equivalent of a ’netstat’ command, but I couldn't find it. Now that I know it’s there, I still can’t find it in the online documentation (https://circle-rpi.readthedocs.io https://circle-rpi.readthedocs.io/). Is it documented somewhere, or am I just not able to find it, even by searching for the function name (ListConnections)?

Of course, I guy I used work for in the 90’s loved to day — anytime one of us would say that we were going to check the documentation — is that:

The Code Is the Documentation

And tell us to read the source code instead!

On Oct 26, 2025, at 5:15 PM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3448929977> So when waiting for the socket to cleanly close, can the Worker task do

anything else? Can I interrupt it, to tell it to check for something, perhaps to signal that it should self- CTask::Terminate()? Just thinking if there is any way to “manually” clean these up by signaling the Worker to end itself?

No, when it hangs in Receive() it can't do anything else.

I will implement a timeout for Send() and Receive() for CSocket. This should solve the problem. I will need some time for this.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3448929977, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KU7UQZ5S2E4ZYZYM4D3ZU2WNAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINBYHEZDSOJXG4. You are receiving this because you authored the thread.

davefilip avatar Oct 26 '25 21:10 davefilip

The class CTransportLayer implements an internal layer, which is normally not directly accessed by the user. That's why it is not documented. ListConnections() is an exception from that for debugging purpose.

rsta2 avatar Oct 27 '25 10:10 rsta2

And it is a wonderfully great exception for debugging purpose! I feel like I have a much better idea of what is happening now.

I’ve been watching the running system for a while. As of right now, there are six (6) httpd tasks that appear dead:

OK, show task

ADDR STAT FL NAME

00 01508940 ready main 01 01723680 ready wifireader 02 01767700 ready wifitimer 03 0171B240 ready net 04 017EB7C0 sleep wpa_supplicant 05 0188EE80 sleep mqtt 06 0188F2C0 ready cluster 07 0192BBC0 ready phantom_watchnet 08 01975C40 ready phantom_anemometer 09 03985AC0 block @.*** 10 01A01500 block telnet 11 018816C0 block tftpd 12 01A86680 block httpd 13 01ACA700 sleep ntpd 14 020A3E00 run telnet_2 15 0623C400 block telnet_3 16 02A1CB40 block @.*** 17 062705C0 block telnet_4 18 0676CE40 block @.*** 19 06A054C0 block @.*** 20 074AFAC0 block @.*** 21 092BBFC0 block @.***

For these, they appear to be in a SYN-RECEIVED state:

OK, show net

PROT LOCAL ADDRESS FOREIGN ADDRESS STATE tcp 10.0.1.206:23 10.0.1.60:52666 ESTABLISHED tcp 10.0.1.206:80 10.0.1.203:52986 SYN-RECEIVED tcp 10.0.1.206:23 10.0.1.60:62478 ESTABLISHED tcp 10.0.1.206:23 10.0.1.60:52666 SYN-RECEIVED tcp 10.0.1.206:23 0.0.0.0:0 LISTEN tcp 10.0.1.206:23 0.0.0.0:0 LISTEN udp 10.0.1.206:69 0.0.0.0:0
tcp 10.0.1.206:23 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:23 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 10.0.1.202:36768 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.203:54662 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.203:55692 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.202:41216 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.203:54048 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.202:41384 TIME-WAIT tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:60479 192.168.1.204:1883 ESTABLISHED So can I assume that this means that these connections are in the TCP-SYN state, and in the process of creating a new connection?

If so, then this is even before the web server receives anything to process?

Nonetheless, hope this helps to understand the problem?

Cheers,

Dave.

On Oct 27, 2025, at 6:25 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3450549581The class CTransportLayer implements an internal layer, which is normally not directly accessed by the user. That's why it is not documented. ListConnections() is an exception from that for debugging purpose.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3450549581, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KQ2UI5ICVC4V2DOA533ZXXKVAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINJQGU2DSNJYGE. You are receiving this because you authored the thread.

davefilip avatar Oct 27 '25 13:10 davefilip

So can I assume that this means that these connections are in the TCP-SYN state, and in the process of creating a new connection?

If so, then this is even before the web server receives anything to process?

Yes, but a connection should normally only remain for a short time in SYN-RECEIVED state, until the ACK from the client for the responded SYN ACK is received. Can it be, that the client disappears after sending the initial SYN?

rsta2 avatar Oct 27 '25 15:10 rsta2

That is possible, not programmatically or through OS error, but through network instability. Although I saw these dead https tasks only very occasionally before, it is much more common on the two (2) RPi + Circle nodes in my basement added recently (last 8 weeks or so), and most common on the one that is mounted inside a basement window sill. When I ping it from my office, two (2) floors up, I see occasional drop-outs, e.g.:

64 bytes from 10.0.1.206: icmp_seq=0 ttl=64 time=124.844 ms Request timeout for icmp_seq 1 64 bytes from 10.0.1.206: icmp_seq=2 ttl=64 time=114.295 ms 64 bytes from 10.0.1.206: icmp_seq=3 ttl=64 time=112.612 ms 64 bytes from 10.0.1.206: icmp_seq=4 ttl=64 time=13.620 ms 64 bytes from 10.0.1.206: icmp_seq=5 ttl=64 time=11.558 ms 64 bytes from 10.0.1.206: icmp_seq=6 ttl=64 time=11.006 ms 64 bytes from 10.0.1.206: icmp_seq=7 ttl=64 time=12.628 ms 64 bytes from 10.0.1.206: icmp_seq=8 ttl=64 time=11.531 ms 64 bytes from 10.0.1.206: icmp_seq=9 ttl=64 time=120.057 ms 64 bytes from 10.0.1.206: icmp_seq=10 ttl=64 time=10.180 ms 64 bytes from 10.0.1.206: icmp_seq=11 ttl=64 time=10.971 ms Request timeout for icmp_seq 12 64 bytes from 10.0.1.206: icmp_seq=13 ttl=64 time=115.246 ms 64 bytes from 10.0.1.206: icmp_seq=14 ttl=64 time=8.961 ms 64 bytes from 10.0.1.206: icmp_seq=15 ttl=64 time=22.842 ms 64 bytes from 10.0.1.206: icmp_seq=16 ttl=64 time=10.674 ms 64 bytes from 10.0.1.206: icmp_seq=17 ttl=64 time=13.729 ms Request timeout for icmp_seq 18 64 bytes from 10.0.1.206: icmp_seq=19 ttl=64 time=113.737 ms 64 bytes from 10.0.1.206: icmp_seq=20 ttl=64 time=9.849 ms 64 bytes from 10.0.1.206: icmp_seq=21 ttl=64 time=9.663 ms 64 bytes from 10.0.1.206: icmp_seq=22 ttl=64 time=10.815 ms Request timeout for icmp_seq 23 64 bytes from 10.0.1.206: icmp_seq=24 ttl=64 time=17.025 ms 64 bytes from 10.0.1.206: icmp_seq=25 ttl=64 time=9.908 ms 64 bytes from 10.0.1.206: icmp_seq=26 ttl=64 time=10.461 ms 64 bytes from 10.0.1.206: icmp_seq=27 ttl=64 time=14.196 ms 64 bytes from 10.0.1.206: icmp_seq=28 ttl=64 time=11.015 ms 64 bytes from 10.0.1.206: icmp_seq=29 ttl=64 time=18.896 ms Request timeout for icmp_seq 30 64 bytes from 10.0.1.206: icmp_seq=31 ttl=64 time=110.562 ms 64 bytes from 10.0.1.206: icmp_seq=32 ttl=64 time=14.423 ms 64 bytes from 10.0.1.206: icmp_seq=33 ttl=64 time=23.823 ms 64 bytes from 10.0.1.206: icmp_seq=34 ttl=64 time=12.215 ms 64 bytes from 10.0.1.206: icmp_seq=35 ttl=64 time=20.308 ms 64 bytes from 10.0.1.206: icmp_seq=36 ttl=64 time=8.617 ms 64 bytes from 10.0.1.206: icmp_seq=37 ttl=64 time=11.530 ms Request timeout for icmp_seq 38 64 bytes from 10.0.1.206: icmp_seq=39 ttl=64 time=117.882 ms 64 bytes from 10.0.1.206: icmp_seq=40 ttl=64 time=15.993 ms 64 bytes from 10.0.1.206: icmp_seq=41 ttl=64 time=10.268 ms 64 bytes from 10.0.1.206: icmp_seq=42 ttl=64 time=24.758 ms Request timeout for icmp_seq 43 Request timeout for icmp_seq 44 64 bytes from 10.0.1.206: icmp_seq=45 ttl=64 time=156.620 ms 64 bytes from 10.0.1.206: icmp_seq=46 ttl=64 time=19.037 ms 64 bytes from 10.0.1.206: icmp_seq=47 ttl=64 time=11.038 ms 64 bytes from 10.0.1.206: icmp_seq=48 ttl=64 time=16.974 ms 64 bytes from 10.0.1.206: icmp_seq=49 ttl=64 time=118.155 ms

So 8 out of 50 packets, or 16%. Of course some ping packet drop-outs are expected because the RPi is busy doing something every minute and every 5 minutes, but much less dropped ping packets for my other 5 RPis running the same software / kernel / configuration, which tend to be < 5% drops.

Remember: One of the reasons for me using Circle is the ‘Cooperative Scheduling’, so I can use devices like DHT11/22 sensors that don’t like being interrupted in the middle of a bit stream (as they do not use a clock signal from the RPi).

But the bigger issue is location, and while being on a WiFi mesh, that mesh node is inside a steel cage, albeit open top and very wide fencing, and this particular RPi being the furthest from any of the mesh nodes, and siting on a cement window ledge.

So why not move it? Because it is wired to outdoor sensors.

That said, I have never had an issue apart from some occasional slight sluggishness connected in with a tenet client, and hitting the web server with a URL pasted into a desktop web browser (Mac Safari) always works. So when doing stuff manually I am not seeing any failed connections.

So does this evidence suggest different thinking that what you thought before? I guess I’m not clear on how the web server is seeing “Receive failed” messages, if the connections are failing in TCP-SYN?

I think the ’standard’ is to try an aborted sync 3 times before giving up? I read that somewhere, but not sure how common that is?

[You may remember me saying that I upgraded from a 7 year old NETGEAR Orbi mesh to a current tri-band TP-Link Deco mesh to solve some WiFi problems, and it has, as before that upgrade, the RPi on the basement window sill, which back then was running Linux, would completely drop off the network and not respond to any ICMP, HTTP, or TELNET requests until it was rebooted, sometimes a few times.]

Nonetheless, overall I am less concerned about my one specific problem — one specific RPi in one specific location — than the overall robustness of network connections in Circle, which Is why I have brought this to your attention.

I was in IT and Tech Support most of my professional career, and I know that there are always other ways to solve technical problems. Therefore, I have ordered at 35 foot CAT6 Ethernet cable to optionally hard wire this particular RPi to the WiFi mesh node, as a “Plan B”, which hopefully should circumvent, but not solve, the problem. However, it is not always possible to pull an Ethernet cable (e.g., wouldn’t want to across my dining room or living room), and most of my IoT nodes are RPi Zero 2Ws (small, low power, low heat, and cheap, but no Ethernet port).

So in other words, I’m concerned about seeing this problem repeated in the future on other projects in other locations. So the question is can (or perhaps even should?) Circle be more robust in less than optimal network conditions? If so, I have a great test environment for that!

Hopeful that makes sense? Thanks again for pointing out the ListConnections() function, as I no longer feel as blind when trying to understand and manage network connections. As I may have said before, one of my favorite quotes is: The Network Is The Computer

Cheers,

Dave.

On Oct 27, 2025, at 11:45 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3452002437 So can I assume that this means that these connections are in the TCP-SYN state, and in the process of creating a new connection?

If so, then this is even before the web server receives anything to process?

Yes, but a connection should normally only remain for a short time in SYN-RECEIVED state, until the ACK from the client for the responded SYN ACK is received. Can it be, that the client disappears after sending the initial SYN?

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3452002437, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KTTWHOPSXBO2FVRQP33ZYVZ5AVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINJSGAYDENBTG4. You are receiving this because you authored the thread.

davefilip avatar Oct 27 '25 16:10 davefilip

Your ping delays are not very good, but I understand the reason. I think socket timeouts for Receive() and Send() will also help in this (remaining in SYN-RECEIVED) case. I will implement them now.

The "Receive failed" message sometimes occurs at my site, when my router tests, if a client has an open TCP port 80 (http). It simply opens a connection, but does not send anything before closing it again. Normally this can be ignored.

Circle retries 5 times, were retries are required in the TCP module.

rsta2 avatar Oct 28 '25 11:10 rsta2

OK, since I never got the stdlib/mbedtls issue resolved, I am still on Circle 49. So if you do post timeout fixes to the development branch, can you indicate which files have changed (assuming that they are backwards compatible with Circle 49)?

Also, is there any way to use Github to clone an older release (e.g., 49 vs. 50)? Not that I plan to make a habit out of it!

On Oct 28, 2025, at 7:50 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3456072982 Your ping delays are not very good, but I understand the reason. I think socket timeouts for Receive() and Send() will also help in this (remaining in SYN-RECEIVED) case. I will implement them now.

The "Receive failed" message sometimes occurs at my site, when my router tests, if a client has an open TCP port 80 (http). It simply opens a connection, but does not send anything before closing it again. Normally this can be ignored.

Circle retries 5 times, were retries are required in the TCP module.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3456072982, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KTRQ77GW5U3SHWCLBT3Z5J7RAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINJWGA3TEOJYGI. You are receiving this because you authored the thread.

davefilip avatar Oct 28 '25 15:10 davefilip

Ah, never mind on the last point (cloning older Circle releases). I figured it out.

What confused me was the 6 or 7 character SHA codes displayed in the commit history. I know how to set the HEAD to a particular commit, but had no idea what to do with such a short code.

Hiding in place sight is the ‘copy doc’ icon next to each commit, which for some reason I never thought meant copy the full SHA-256 to my clipboard.

On Oct 28, 2025, at 11:46 AM, David Filip @.***> wrote:

OK, since I never got the stdlib/mbedtls issue resolved, I am still on Circle 49. So if you do post timeout fixes to the development branch, can you indicate which files have changed (assuming that they are backwards compatible with Circle 49)?

Also, is there any way to use Github to clone an older release (e.g., 49 vs. 50)? Not that I plan to make a habit out of it!

On Oct 28, 2025, at 7:50 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3456072982 Your ping delays are not very good, but I understand the reason. I think socket timeouts for Receive() and Send() will also help in this (remaining in SYN-RECEIVED) case. I will implement them now.

The "Receive failed" message sometimes occurs at my site, when my router tests, if a client has an open TCP port 80 (http). It simply opens a connection, but does not send anything before closing it again. Normally this can be ignored.

Circle retries 5 times, were retries are required in the TCP module.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3456072982, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KTRQ77GW5U3SHWCLBT3Z5J7RAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINJWGA3TEOJYGI. You are receiving this because you authored the thread.

davefilip avatar Oct 28 '25 16:10 davefilip

While implementing the socket timeouts I found a bug in the TCP module, which might have caused the problem, why you have opened this issue. When an error occurred in a connection and the socket is closed afterwards by deleting it, the connection was not shutdown properly. I will come back, when I have fixed the problem.

BTW. In a git command a given commit SHA can have any length, as long it is unique in the git repository. Normally the first 8 characters are sufficient.

rsta2 avatar Oct 29 '25 09:10 rsta2

The issue should be fixed on the develop branch and socket timeouts for receive and send have been added. This is used in the sample/21-webserver now. The class CHTTPDaemon has been updated for this purpose too.

If you want to apply the fix to Circle 49 (without socket timeouts), the commit 09ffd0b should be enough.

Thanks for reporting this issue!

rsta2 avatar Oct 29 '25 11:10 rsta2

Thanks Rene, I’ll apply this fix to tcpconnection.cpp and test it out today.

If I wanted to back-port the socket timeouts for receive and send to Circle 49, is there a file(s) that I can copy from develop? Although I acknowledge that in general it is difficult for you to support older releases beyond a certain point.

On Oct 29, 2025, at 7:06 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3460952910 The issue should be fixed on the develop branch and socket timeouts for receive and send have been added. This is used in the sample/21-webserver now. The class CHTTPDaemon has been updated for this purpose too.

If you want to apply the fix to Circle 49 (without socket timeouts), the commit 09ffd0b https://github.com/rsta2/circle/commit/09ffd0b05dc3f77552e7547f95bf2a37c74be75e should be enough.

Thanks for reporting this issue!

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3460952910, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KRYHR3Y3IMDYHQDC5332CNRXAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRQHE2TEOJRGA. You are receiving this because you authored the thread.

davefilip avatar Oct 29 '25 11:10 davefilip

For the socket timeouts and the HTTP server the changes are in the following files:

include/circle/net/httpdaemon.h     |  6 ++++--
include/circle/net/netconnection.h  |  3 +++
include/circle/net/netsocket.h      | 12 ++++++++++++
include/circle/net/socket.h         | 12 ++++++++++++
include/circle/net/tcpconnection.h  |  6 ++++++
include/circle/net/tcprejector.h    |  2 ++
include/circle/net/transportlayer.h |  3 +++
include/circle/net/udpconnection.h  |  4 ++++
lib/net/httpdaemon.cpp              |  8 ++++++--
lib/net/socket.cpp                  | 22 ++++++++++++++++++++++
lib/net/tcpconnection.cpp           | 53 ++++++++++++++++++++++++++++++++++++++++++++---------
lib/net/transportlayer.cpp          | 24 ++++++++++++++++++++++++
lib/net/udpconnection.cpp           | 40 ++++++++++++++++++++++++++++++++++++++--

But unfortunately you cannot simply copy them to Circle 49, because there were already changes in Circle 50 in some of these files, which have dependencies in other files. I think you should try it with the mentioned commit only to fix the initial problem.

rsta2 avatar Oct 29 '25 11:10 rsta2

Got it, thanks!

I’ve have spent try trying to figure out the MBEDTLS problem, but so far have been unsuccessful. It has something to do with an entropy source. But I do want to get to Circle 50 when I can figure it out.

On Oct 29, 2025, at 7:50 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3461117804 For the socket timeouts and the HTTP server the changes are in the following files:

include/circle/net/httpdaemon.h | 6 ++++-- include/circle/net/netconnection.h | 3 +++ include/circle/net/netsocket.h | 12 ++++++++++++ include/circle/net/socket.h | 12 ++++++++++++ include/circle/net/tcpconnection.h | 6 ++++++ include/circle/net/tcprejector.h | 2 ++ include/circle/net/transportlayer.h | 3 +++ include/circle/net/udpconnection.h | 4 ++++ lib/net/httpdaemon.cpp | 8 ++++++-- lib/net/socket.cpp | 22 ++++++++++++++++++++++ lib/net/tcpconnection.cpp | 53 ++++++++++++++++++++++++++++++++++++++++++++--------- lib/net/transportlayer.cpp | 24 ++++++++++++++++++++++++ lib/net/udpconnection.cpp | 40 ++++++++++++++++++++++++++++++++++++++-- But unfortunately you cannot simply copy them to Circle 49, because there were already changes in Circle 50 in some of these files, which have dependencies in other files. I think you should try it with the mentioned commit only to fix the initial problem.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3461117804, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KQKKITZ7WUL7SQC4SL32CSXFAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRRGEYTOOBQGQ. You are receiving this because you authored the thread.

davefilip avatar Oct 29 '25 11:10 davefilip

There is a new class CThreadingModule, which must be added as a member variable to CKernel with Mbed TLS 2.10. Maybe this is the problem. Please see this sample.

rsta2 avatar Oct 29 '25 12:10 rsta2

Interesting, I was focusing on the .cpp and not the header. Usually, if something is missing from the header, then it throws a compiler error. But I guess this code just has lots of conditional compiles that gets around missing things.

So I had not been defining either of these in my code:

CircleMbedTLS::CThreadingModule m_ThreadingModule;
CircleMbedTLS::CEntropyModule m_EntropyModule;

which was originally based on Stefan’s circle-stdlib/samples/mbedtls/01-https-client1/ssl_client.c. I did this whenever Circle 46 (I think) was the most current, so maybe about a year ago (?).

Although it worked fine from Circle 46, 47, 48 through 49, I see the first (CircleMbedTLS::CEntropyModule m_EntropyModule) listed in the example code for the version of circle-stdlib synchronized with Circle 49.

So I now tried adding BOTH to my code, and it still complies cleanly, but when I try to open an HTTPS connection, I now get a crash in CMutex::Acquire(). If I include just the entropy (CircleMbedTLS::CEntropyModule) without the threading (CircleMbedTLS::CThreadingModule), I get still the error about not being able to send the random number generator, but it doesn’t crash.

On Oct 29, 2025, at 8:05 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3461171836 There is a new class CThreadingModule, which must be added as a member variable to CKernel with Mbed TLS 2.10. Maybe this is the problem. Please see this https://github.com/smuehlst/circle-stdlib/blob/master/samples/mbedtls/01-https-client1/kernel.h sample.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3461171836, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KQD7FG26IATBNKHZSL32CUSBAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRRGE3TCOBTGY. You are receiving this because you authored the thread.

davefilip avatar Oct 29 '25 17:10 davefilip

OK, it doesn't work. I could make assumptions, why it crashes, but in the end it can become a longer story, which does not belong into this repo, because it's a circle-mbedtls thing. Please report it in the circle-stdlib repository, if you need further help on this.

rsta2 avatar Oct 29 '25 22:10 rsta2

Absolutely. Sorry I mentioned it, but you made a suggestion, so I wanted to answer that suggestion.

Nonetheless I will not mention it on this thread again.

On Oct 29, 2025, at 6:12 PM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3464399615OK, it doesn't work. I could make assumptions, why it crashes, but in the end it can become a longer story, which does not belong into this repo, because it's a circle-mbedtls thing. Please report it in the circle-stdlib repository, if you need further help on this.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3464399615, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KVTABJQQM4PAP4JYTL32E3TVAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRUGM4TSNRRGU. You are receiving this because you authored the thread.

davefilip avatar Oct 29 '25 22:10 davefilip

No need to say sorry. I thought, it would quickly help, but I was wrong.

rsta2 avatar Oct 29 '25 22:10 rsta2

webserver.cpp

To be clear, the fix is simply removing these lines from tcpconnection.cpp:

322c322,326 <

if (m_nErrno < 0) { return m_nErrno; }

so that I falls through to the switch statement even if m_nErrno is negative. Is that correct?

If so, it might be a partial success, as before I was seeing about 1+ TCP-SYNC per hour, so I would hit 10 dead httpds in a few hours. However, with this patch, assuming all other things being equal, it took almost 22 hours to accumulate 10 TCP-SYNCs:

ADDR STAT FL NAME

00 01508940 ready main 01 01723680 ready wifireader 02 01767700 ready wifitimer 03 0171B240 ready net 04 017EB7C0 sleep wpa_supplicant 05 018A4400 sleep mqtt 06 018A5240 ready cluster 07 01942940 ready phantom_watchnet 08 0198C9C0 ready phantom_anemometer 09 0B035DC0 block @.*** 10 01A18280 block telnet 11 01882700 block tftpd 12 01A9D400 block httpd 13 01AE1480 sleep ntpd 14 0170A180 run telnet_2 15 026D4940 block @.*** 16 020E3440 block @.*** 17 027C02C0 block @.*** 18 0477E400 block @.*** 19 096F1500 block @.*** 20 0307B980 block @.*** 21 0B98F440 block @.*** 22 0D661E80 block @.*** 23 0E6B74C0 block @.***

PROT LOCAL ADDRESS FOREIGN ADDRESS STATE tcp 10.0.1.206:80 10.0.1.202:42816 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.202:44124 TIME-WAIT tcp 10.0.1.206:80 10.0.1.201:45666 SYN-RECEIVED tcp 10.0.1.206:23 10.0.1.60:53347 ESTABLISHED tcp 10.0.1.206:23 0.0.0.0:0 LISTEN tcp 10.0.1.206:23 0.0.0.0:0 LISTEN tcp 10.0.1.206:23 0.0.0.0:0 LISTEN udp 10.0.1.206:69 0.0.0.0:0
tcp 10.0.1.206:80 10.0.1.201:46082 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.201:55202 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.201:44250 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.202:42816 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.201:43132 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.202:42948 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.203:40942 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.202:39046 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.202:42948 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.202:43182 SYN-RECEIVED tcp 10.0.1.206:23 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 10.0.1.202:43144 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.203:42234 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.201:49914 FIN-WAIT-1 tcp 10.0.1.206:80 10.0.1.201:49912 TIME-WAIT tcp 10.0.1.206:80 10.0.1.203:42256 SYN-RECEIVED tcp 10.0.1.206:80 10.0.1.203:42988 TIME-WAIT tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:60971 192.168.1.204:1883 ESTABLISHED tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN tcp 10.0.1.206:80 0.0.0.0:0 LISTEN

Just a thought - because I am seeing more than ten (10) SYNC-RECEIVED’s after the connection limit of ten (10) is reached in the server. So is it possible that these are being caused by the web server being too busy / unable to accept the connection? So what happens — or should happen — when the web server does not accept the connection, and how should that clean itself up at the network connection level? I am assuming that we should not just keep accumulating SYN-RECEIVED connections?

Full disclosure, I did code-fork CWebServer to my own code, which I have not focused on because the problem appears to be at a much lower level in the protocol stack, but I am attaching my web server code in case you find me doing anything egregious in it. Although the same web server is running 24x7 on 5 other Circle-based nodes without this problem, with much better WiFi connections, except on rare occasions (maybe sometimes 1 or 2 dead httpds after a week or more on one of the other nodes?).

Cheers,

Dave.

On Oct 29, 2025, at 7:06 AM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3460952910 The issue should be fixed on the develop branch and socket timeouts for receive and send have been added. This is used in the sample/21-webserver now. The class CHTTPDaemon has been updated for this purpose too.

If you want to apply the fix to Circle 49 (without socket timeouts), the commit 09ffd0b https://github.com/rsta2/circle/commit/09ffd0b05dc3f77552e7547f95bf2a37c74be75e should be enough.

Thanks for reporting this issue!

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3460952910, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KRYHR3Y3IMDYHQDC5332CNRXAVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRQHE2TEOJRGA. You are receiving this because you authored the thread.

davefilip avatar Oct 30 '25 14:10 davefilip

It's likely, that you need the complete fix including the socket timeouts to solve this problem. If this doesn't help, we could look further.

rsta2 avatar Oct 30 '25 16:10 rsta2

Understood … might be a while before I can upgrade from Circle 49 to Circle 50 … so keep this issue open until then, or should I close it now? Please advise on how you want me to proceed.

On Oct 30, 2025, at 12:42 PM, Rene Stange @.***> wrote:

rsta2 left a comment (rsta2/circle#618) https://github.com/rsta2/circle/issues/618#issuecomment-3468976807It's likely, that you need the complete fix including the socket timeouts to solve this problem. If this doesn't help, we could look further.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/618#issuecomment-3468976807, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KS6L3LVHJNYZMU2ZEL32I5W3AVCNFSM6AAAAACKGGEHQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTINRYHE3TMOBQG4. You are receiving this because you authored the thread.

davefilip avatar Oct 30 '25 16:10 davefilip

Please keep it open, until it is confirmed that the initial problem is fixed.

rsta2 avatar Oct 30 '25 17:10 rsta2

Please note, that you have to activate the socket receive timeout, if you are using your own version of the web server. Otherwise it will not be active. There is a new parameter for CHTTPDaemon::CHTTPDaemon() here, which must be greater than zero. It sets the timeout here. You can test this using telnet ip_address 80. The Telnet connection should be terminated automatically after the specified amount of time. But of course you have to update to the version from the develop branch first.

rsta2 avatar Oct 30 '25 18:10 rsta2

Unfortunately, the problem goes beyond bad network connectivity, as I've connected with an Ethernet cable directly to an IPV6 Mesh node, and while I now have much better network connectivity:

64 bytes from 10.0.1.206: icmp_seq=420 ttl=64 time=6.233 ms
64 bytes from 10.0.1.206: icmp_seq=421 ttl=64 time=6.853 ms
64 bytes from 10.0.1.206: icmp_seq=422 ttl=64 time=5.633 ms
64 bytes from 10.0.1.206: icmp_seq=423 ttl=64 time=9.308 ms
64 bytes from 10.0.1.206: icmp_seq=424 ttl=64 time=54.515 ms
64 bytes from 10.0.1.206: icmp_seq=425 ttl=64 time=98.863 ms
64 bytes from 10.0.1.206: icmp_seq=426 ttl=64 time=43.830 ms
64 bytes from 10.0.1.206: icmp_seq=427 ttl=64 time=61.904 ms
64 bytes from 10.0.1.206: icmp_seq=428 ttl=64 time=6.546 ms
64 bytes from 10.0.1.206: icmp_seq=429 ttl=64 time=6.120 ms
64 bytes from 10.0.1.206: icmp_seq=430 ttl=64 time=9.924 ms
64 bytes from 10.0.1.206: icmp_seq=431 ttl=64 time=23.986 ms
64 bytes from 10.0.1.206: icmp_seq=432 ttl=64 time=40.363 ms
64 bytes from 10.0.1.206: icmp_seq=433 ttl=64 time=132.691 ms
64 bytes from 10.0.1.206: icmp_seq=434 ttl=64 time=20.766 ms
64 bytes from 10.0.1.206: icmp_seq=435 ttl=64 time=28.474 ms
64 bytes from 10.0.1.206: icmp_seq=436 ttl=64 time=66.904 ms
64 bytes from 10.0.1.206: icmp_seq=437 ttl=64 time=107.268 ms
64 bytes from 10.0.1.206: icmp_seq=438 ttl=64 time=154.969 ms
64 bytes from 10.0.1.206: icmp_seq=439 ttl=64 time=7.852 ms
64 bytes from 10.0.1.206: icmp_seq=440 ttl=64 time=6.260 ms

with no drop-outs, I am still getting dead http processes stuck in SYNC-RECEIVED:

12 01DBB680 block    httpd@1dbb680
13 01E4AA80 block    httpd@1e4aa80
14 01F31180 run      telnet_2
16 01FEEE80 block    httpd@1feee80
17 022CEC00 block    httpd@22cec00
18 01F828C0 block    httpd@1f828c0
PROT LOCAL ADDRESS         FOREIGN ADDRESS       STATE
tcp  10.0.1.206:80         10.0.1.202:52456      SYN-RECEIVED
tcp  10.0.1.206:60001      192.168.1.204:1883    ESTABLISHED
tcp  10.0.1.206:23         0.0.0.0:0             LISTEN
tcp  10.0.1.206:7          0.0.0.0:0             LISTEN
tcp  10.0.1.206:7          0.0.0.0:0             LISTEN
tcp  10.0.1.206:7          0.0.0.0:0             LISTEN
tcp  10.0.1.206:7          0.0.0.0:0             LISTEN
tcp  10.0.1.206:23         10.0.1.60:58061       ESTABLISHED
tcp  10.0.1.206:23         0.0.0.0:0             LISTEN
tcp  10.0.1.206:23         0.0.0.0:0             LISTEN
tcp  10.0.1.206:23         0.0.0.0:0             LISTEN
udp  10.0.1.206:69         0.0.0.0:0             
tcp  10.0.1.206:80         10.0.1.202:52456      SYN-RECEIVED
tcp  10.0.1.206:80         10.0.1.202:52708      SYN-RECEIVED
tcp  10.0.1.206:80         10.0.1.201:55958      TIME-WAIT
tcp  10.0.1.206:80         10.0.1.202:52708      SYN-RECEIVED

I am now running Circle 50.0.1 (not develop). Can you please advise as to what I need to patch to get the socket time-out code? Unfortunately, my build is based on circle-stdlib, which auto-downloads Circle 50.0.1, and I realize that this is NOT the place to discuss circle-stdlib issues! However, can you advise as to what I need to copy/patch from the develop branch to get the timeouts that you have implemented into 50.0.1? I am hoping that is easy to do?

Cheers,

Dave.

davefilip avatar Nov 20 '25 14:11 davefilip

Yes, the socket timeouts are currently only on the develop branch available. I tried to extract a patch, but this is not that simple because of changes in many files for the socket timeouts. If you do not want to checkout develop directly you could checkout commit 6247b896. There are only changes up to this commit, which are necessary for the socket timeouts or which are relatively safe to not destroy something:

$ cd libs/circle
$ git checkout 6247b896

I guess you are using the class CHTTPDaemon for the port 80 servers? Then you only have to add the timeout value (in number of seconds) as last parameter in the constructor of that class to enable the timeout here:

CHTTPDaemon (pNetSubSystem, pSocket, MAX_CONTENT_SIZE, HTTP_PORT, 0, TIMEOUT_SECONDS)

Please see sample/21-webserver. I think a value of 10 (seconds) should be OK.

rsta2 avatar Nov 20 '25 17:11 rsta2