MicroWebSrv2
MicroWebSrv2 copied to clipboard
Fixed freezing issue in XAsyncSockets lib and updated thread locking
I started this pull request to fix an issue I got:
I have a server running with embedded config on an ESP32 which often froze on socket.SendTextMessage(message)
which I'm calling once every second with only an 81 byte-long message. I traced down the issue all the way to _socketListRemove where it sometimes hangs forever on socketsList.remove(socket)
. Since it never returns from this command and stays locked when _socketListAdd wants to use it already, the sockets stay frozen forever.
Returning False if the _opLock is locked fixed the problem for me but it's rather a quick workaround with which I just wanted to point out the issue and discuss the solutions. Before returning False, I tried printing len(socketsList)
which always returned 0 on me, so I guess the .remove got processed anyways.
I also remade thread locking using the 'with' statement. Tested and it works the same, but is more elegant (and reliable?)
Hello @ElHyperion,
I'm ok to using with
instead of acquire
and release
.
But I don't understand why _socketListRemove
can stay locked. It's very strange?
Could you the exact instruction that causes this?
socket.fileno()
or the _opLock
for another reason?
Thank you!
After more thorough testing, I think the problem appears with the socket lock I use right before sending text messages. Any recommendations on how to change it? I had it without _socket_lock at first, with the same freezing issue, and thought that adding the lock to it would fix the problem.
UPDATE: Is this two-way communication too much for a single socket, and should I use two sockets instead? Or is the issue somewhere else?
This is XAsyncSockets with added prints:
def _socketListAdd(self, socket, socketsList) :
if self._opLock.locked() :
print('Add: cannot lock!', socket.fileno(), len(socketsList))
with self._opLock :
print('Add: locked', socket.fileno(), len(socketsList))
ok = (socket.fileno() in self._asyncSockets and socket not in socketsList)
if ok :
print('Add: if ok', socket.fileno(), len(socketsList))
socketsList.append(socket)
print('Add: appended', socket.fileno(), len(socketsList))
print('Add: unlocked')
return ok
def _socketListRemove(self, socket, socketsList) :
with self._opLock :
print('Remove: locked', socket.fileno(), len(socketsList))
ok = (socket.fileno() in self._asyncSockets and socket in socketsList)
if ok :
print('Remove: if ok', socket.fileno(), len(socketsList))
socketsList.remove(socket)
print('Remove: removed', socket.fileno(), len(socketsList))
print('Remove: unlocked')
return ok
This is the code I use for sending text messages over sockets:
def send_telemetry(data):
message = '{0:d}{1}'.format(_data_complete, data)
print('Broadcasting %d B' % len(data))
with _socket_lock:
print('Socket lock')
for s in _sockets: # Only one socket for now
s.SendTextMessage(message)
print('Sent messages')
Remove: locked 60 2 Remove: if ok 60 2 Remove: removed 60 1 Remove: unlocked Add: locked 60 1 Add: if ok 60 1 Add: appended 60 2 Add: unlocked Remove: locked 60 6 Broadcasting 81 B Socket lock ------- Problem here? Add: cannot lock! 60 0
I'm checking all but I don't understand how this _opLock can stay locked 😞
_socketListAdd and _socketListRemove are very called often for I/Os manipulation and this _opLock is really mandatory here.
Doesn't the freezer come with an inter-thread lock?
Do you call mws2.StartManaged()
with parllProcCount
greater than 1?
Otherwise, could you try with parllProcCount=5 for example?
I start it with no arguments. Adding parllProcCount=3
and greater results in memory allocation error (I can get only up to 64kB free before starting the server). I tried parllProcCount=2
but the issue continued appearing, only made the website download twice as slow (about 40s instead of 20s). parllProcCount=1
worked fine but did not fix the locking issue either.
Setting parllProcCount=0
seems to fix it but I cannot do that anymore, it just hangs on the StartManaged command forever in this case. I could start it this way before, while testing the proc count, under some unknown circumstances, but I cannot recreate them anymore. I'm not sure if it's a problem of memory though because starting it through REPL and hitting Ctrl+C after the command freezes I get around 44.8kB on gc.mem_free(). The server then appears to run but does not respond to any requests (visiting the website is stuck on loading).
I'll be testing it on a WROVER kit with 8MB SRAM soon, if that helps.
Ok.
parllProcCount=1
means that the I/Os events loop of the server only waits once in a dedicated thread and parllProcCount=0
, waits in main thread (and main thread is locked until the server is stopped).
So, in both cases, your web site must be works.
But for the moment, I still don't understand your locking problem in XAsyncSockets lib and I would like see your code if it's possible.
Do you can send me it by mail to [email protected] please?
I'd be really curious to more investigate and understand that.
I encountered same situation -- thread is locked. My solution is as below:
1: comment out all the self._opLock.acquire() and self._opLock.release() in XAsyncScokets.py
2: wms.StartManaged(parllProcCount=0)
That is only one main thread and without thead lock
it works.
The asynchronous concurrent thread system has just been redesigned. It now uses a battery of workers, so there should be no more problems.