MicroWebSrv2 icon indicating copy to clipboard operation
MicroWebSrv2 copied to clipboard

Fixed freezing issue in XAsyncSockets lib and updated thread locking

Open ElHyperion opened this issue 4 years ago • 7 comments

I started this pull request to fix an issue I got: I have a server running with embedded config on an ESP32 which often froze on socket.SendTextMessage(message) which I'm calling once every second with only an 81 byte-long message. I traced down the issue all the way to _socketListRemove where it sometimes hangs forever on socketsList.remove(socket). Since it never returns from this command and stays locked when _socketListAdd wants to use it already, the sockets stay frozen forever.

Returning False if the _opLock is locked fixed the problem for me but it's rather a quick workaround with which I just wanted to point out the issue and discuss the solutions. Before returning False, I tried printing len(socketsList) which always returned 0 on me, so I guess the .remove got processed anyways.

I also remade thread locking using the 'with' statement. Tested and it works the same, but is more elegant (and reliable?)

ElHyperion avatar Feb 25 '20 10:02 ElHyperion

Hello @ElHyperion, I'm ok to using with instead of acquire and release. But I don't understand why _socketListRemove can stay locked. It's very strange? Could you the exact instruction that causes this? socket.fileno() or the _opLock for another reason? Thank you!

jczic avatar Feb 26 '20 04:02 jczic

After more thorough testing, I think the problem appears with the socket lock I use right before sending text messages. Any recommendations on how to change it? I had it without _socket_lock at first, with the same freezing issue, and thought that adding the lock to it would fix the problem.

UPDATE: Is this two-way communication too much for a single socket, and should I use two sockets instead? Or is the issue somewhere else?

This is XAsyncSockets with added prints:

def _socketListAdd(self, socket, socketsList) :
     if self._opLock.locked() :
         print('Add: cannot lock!', socket.fileno(), len(socketsList))
     with self._opLock :
         print('Add: locked', socket.fileno(), len(socketsList))
         ok = (socket.fileno() in self._asyncSockets and socket not in socketsList)
         if ok :
             print('Add: if ok', socket.fileno(), len(socketsList))
             socketsList.append(socket)
             print('Add: appended', socket.fileno(), len(socketsList))
     print('Add: unlocked')
     return ok

def _socketListRemove(self, socket, socketsList) :
     with self._opLock :
         print('Remove: locked', socket.fileno(), len(socketsList))
         ok = (socket.fileno() in self._asyncSockets and socket in socketsList)
         if ok :
             print('Remove: if ok', socket.fileno(), len(socketsList))
             socketsList.remove(socket)
             print('Remove: removed', socket.fileno(), len(socketsList))
     print('Remove: unlocked')
     return ok

This is the code I use for sending text messages over sockets:

def send_telemetry(data):
    message = '{0:d}{1}'.format(_data_complete, data)
    print('Broadcasting %d B' % len(data))
    with _socket_lock:
        print('Socket lock')
        for s in _sockets:  # Only one socket for now
            s.SendTextMessage(message)
        print('Sent messages')

Remove: locked 60 2 Remove: if ok 60 2 Remove: removed 60 1 Remove: unlocked Add: locked 60 1 Add: if ok 60 1 Add: appended 60 2 Add: unlocked Remove: locked 60 6 Broadcasting 81 B Socket lock ------- Problem here? Add: cannot lock! 60 0

ElHyperion avatar Feb 26 '20 09:02 ElHyperion

I'm checking all but I don't understand how this _opLock can stay locked 😞 _socketListAdd and _socketListRemove are very called often for I/Os manipulation and this _opLock is really mandatory here. Doesn't the freezer come with an inter-thread lock? Do you call mws2.StartManaged() with parllProcCount greater than 1? Otherwise, could you try with parllProcCount=5 for example?

jczic avatar Feb 28 '20 00:02 jczic

I start it with no arguments. Adding parllProcCount=3 and greater results in memory allocation error (I can get only up to 64kB free before starting the server). I tried parllProcCount=2 but the issue continued appearing, only made the website download twice as slow (about 40s instead of 20s). parllProcCount=1 worked fine but did not fix the locking issue either.

Setting parllProcCount=0 seems to fix it but I cannot do that anymore, it just hangs on the StartManaged command forever in this case. I could start it this way before, while testing the proc count, under some unknown circumstances, but I cannot recreate them anymore. I'm not sure if it's a problem of memory though because starting it through REPL and hitting Ctrl+C after the command freezes I get around 44.8kB on gc.mem_free(). The server then appears to run but does not respond to any requests (visiting the website is stuck on loading).

I'll be testing it on a WROVER kit with 8MB SRAM soon, if that helps.

ElHyperion avatar Feb 29 '20 13:02 ElHyperion

Ok. parllProcCount=1 means that the I/Os events loop of the server only waits once in a dedicated thread and parllProcCount=0, waits in main thread (and main thread is locked until the server is stopped). So, in both cases, your web site must be works. But for the moment, I still don't understand your locking problem in XAsyncSockets lib and I would like see your code if it's possible. Do you can send me it by mail to [email protected] please? I'd be really curious to more investigate and understand that.

jczic avatar Mar 04 '20 13:03 jczic

I encountered same situation -- thread is locked. My solution is as below: 1: comment out all the self._opLock.acquire() and self._opLock.release() in XAsyncScokets.py
2: wms.StartManaged(parllProcCount=0) That is only one main thread and without thead lock

it works.

kenjican avatar Jan 27 '22 09:01 kenjican

The asynchronous concurrent thread system has just been redesigned. It now uses a battery of workers, so there should be no more problems.

jczic avatar Feb 18 '24 20:02 jczic