MicroPython_ESP32_psRAM_LoBo icon indicating copy to clipboard operation
MicroPython_ESP32_psRAM_LoBo copied to clipboard

multithread + garbage collection + socket causes CPU halt

Open taitix opened this issue 6 years ago • 11 comments

When I use microWebSrv+microWebSocket, Guru Meditation (LoadProhibited) occurs when automatic GC runs or manually run gc.collect(). I wrote a minimal code to reproduce the issue which ended up in assertion failure which happend only in try statement within a function. (I'm not sure if this is causing the Guru Meditaion (LoadProhibited) though...)

from time import sleep_ms
import machine, socket, gc, _thread
from network import WLAN, STA_IF

wlan = WLAN(STA_IF)
wlan.active(True)
wlan.connect('SSID', 'PASSWORD')
while not wlan.isconnected():
  sleep_ms(100)
  print('.', end='')

def gc_thread():
  while True:
    sleep_ms(10)
    gc.collect()

_thread.start_new_thread('GUI', gc_thread, ())

s = socket.socket(socket.AF_INET,
                         socket.SOCK_STREAM,
                         socket.IPPROTO_TCP)
s.bind(('0.0.0.0', 80))
s.listen(1)
s.settimeout(0.1)

# This block runs fine
try:
  client, cliAddr = s.accepted()
except Exception as e:
  print(e)

def f1(s):
  client, cliAddr = s.accepted()

def f2(s):
  try:
    client, cliAddr = s.accepted()
  except Exception as e:
    print(e)

f1(s) # this is fine
print("i'm still alive!")
f2(s) # <- calling this causes following:
# assertion "ATB_GET_KIND(block) == AT_HEAD" failed: file "/home/LoBo2_Razno/ESP32/MicroPython/MicroPython_ESP32_psRAM_LoBo/MicroPython_BUILD/components/micropython/py/gc.c", line 596, function: gc_free
# abort() was called at PC 0x4012bb07 on core 1
#
# Backtrace: 0x40095bcc:0x3ffc60d0 0x40095d93:0x3ffc60f0 0x4012bb07:0x3ffc6110 0x400ed38e:0x3ffc6140 0x400ecb55:0x3ffc6160 0x400fa919:0x3ffc6180 0x400f5acd:0x3ffc61b0 0x40103e07:0x3ffc61d0 0x400fa8da:0x3ffc6270 0x400f5acd:0x3ffc62a0 0x400f5afa:0x3ffc62c0 0x400e56e3:0x3ffc62e0 0x400e5999:0x3ffc6390 0x400d7f42:0x3ffc63b0
# CPU halted.

taitix avatar Nov 14 '18 11:11 taitix

#218

Compared with the original MicroPython, I found that the CLEAR_ON_SWEEP definition in "MicroPython_BUILD/components/micropython/py/gc.c" is different. When I defined this definition as 0 just as original is, 'CPU halted' no longer occurs.

but I do not know yet whether this modification is ​​correct.

KKawase0104 avatar Nov 15 '18 05:11 KKawase0104

I tried setting CLEAR_ON_SWEEP to 0, but still getting the same assertion failure for the code above.

I found another maybe related issue in the forum: https://loboris.eu/forum/showthread.php?tid=213

taitix avatar Nov 15 '18 07:11 taitix

Below is shorter code to cause LoadProhibited. I'm not sure if it's related to socket at this point...

from time import sleep_ms
import socket, gc, _thread

def socket_func():
  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP)

def gc_func():
  gc.collect()

def f():
  _thread.start_new_thread('socket', socket_func, ());
  gc_func()

f() # <= LoadProhibited

taitix avatar Nov 16 '18 13:11 taitix

Thank you for your source code.

I changed f() function bellow, run without CPU halt

def f(): _thread.start_new_thread('socket', socket_func, ()); sleep_ms(1000) gc_func()

It may be a timing problem of memory management of gc. I checked the log of gc, just in this case, I looked like that before allocating a new thread memory area by gc, cleared by the gc_sweep function (called by gc.colloct() by main thread).

KKawase0104 avatar Nov 17 '18 09:11 KKawase0104

Thanks for looking into the c code. I managed to shorten the example. I also found that if I comment out "from time import sleep_ms", LoadProhibited doesn't occur.

import gc, _thread
from time import sleep_ms # commenting out this to avoid LoadProhibited 

def thread_func():
  pass

def f():
  _thread.start_new_thread('thread_func', thread_func, ());
  gc.collect()
# f() <= LoadProhibited

taitix avatar Nov 17 '18 14:11 taitix

Hmm... While looking at the gc code, I found this line: #if MICROPY_PY_THREAD && MICROPY_PY_THREAD_GIL https://github.com/loboris/MicroPython_ESP32_psRAM_LoBo/blob/fede6f610c9952c7242fc309c2bb4efc9afb0919/MicroPython_BUILD/components/micropython/py/gc.c#L100 is different from the original micropython code: #if MICROPY_PY_THREAD && !MICROPY_PY_THREAD_GIL https://github.com/micropython/micropython-esp32/blob/2f4dac5f121a59fc187c1d9c1f9eade365b3aba1/py/gc.c#L95 When GIL is used, gc mutex should not be necessary? Is this because Lobo version has some other non-python tasks accessing gc?

taitix avatar Nov 17 '18 15:11 taitix

I use gc and we set a threshold. The rest is all done by another thread (C based I would assume). This way, you don't need to address it in your code. Loboris also added a percentage setpoint in the sdkconfig.

Regards, Benoit

bdespatis avatar Nov 29 '18 23:11 bdespatis

The problem was that when automatic garbage collection is triggered, my multithread project (especially when using MicroWebServ) randomly crashes (oftentimes Load Prohibited). Because I have to wait very long for this to happen and hard to debug, I tried to write minimal code which can instantly reproduce the kernel panic. If I understand collectly, automatic garbage collection can be triggered by any thread when allocating memory (in gc_alloc), and possibly causing random crash when using threads.

taitix avatar Nov 30 '18 18:11 taitix

The review of the issues related to the gc and threads is planed for the 2nd half of this month. I have allready upgraded the _thread module and will test it extensively for the gc related issues.

loboris avatar Dec 01 '18 14:12 loboris

any update? I got same issue when testing microWebServer.

thanks.

gengshenghong avatar Dec 25 '18 14:12 gengshenghong

I've tried both official microWebServer and the frozen version, both will cause random crash (when creating different number of threads) with very simple test application. Hope this can be fixed as it blocks usage of threads if it is a gc issue.

with different N_WORKER_THREADS and different stackSize, it sometimes crash, sometimes will not.

N_WORKER_THREADS = 2
WORKER_HAS_SLEEP = True

import utime
import _thread
def thread_entry(n):
    while True:
        if WORKER_HAS_SLEEP:
            utime.sleep(10)

for i in range(0, N_WORKER_THREADS):
    _thread.start_new_thread('worker', thread_entry, (i,))

from microWebSrv import MicroWebSrv

@MicroWebSrv.route('/echo', 'POST')
def handlerFuncGet(httpClient, httpResponse):
    #jsonIn = httpClient.ReadRequestContentAsJSON()
    jsonOut = {}
    httpResponse.WriteResponseJSONOk(obj=jsonOut)

mws = MicroWebSrv()             # TCP port 80 and files in /flash/www
mws.Start(threaded=True, stackSize=10240)         # Starts server in a new thread

Thanks, Shenghong

gengshenghong avatar Dec 26 '18 09:12 gengshenghong