MicroPython_ESP32_psRAM_LoBo icon indicating copy to clipboard operation
MicroPython_ESP32_psRAM_LoBo copied to clipboard

Python thread + gc.collect() = CPU halt

Open Isopodus opened this issue 6 years ago • 15 comments

Hi. Idk if this project is still supported, but i have got some troubles when using multithreading with gc.collect(). This issue is particulary same as #241, but just I hope to get yout attention. How to reproduce:

import gc
from machine import Pin
def my_thread(args):
    try:
        _thread.allowsuspend(True)
        led = Pin(2, 2)
        while True:
            ntf = thread.getnotification()
            if ntf:
	        if ntf == thread.EXIT:
                    return
		elif ntf == thread.SUSPEND:
	            while thread.wait() != thread.RESUME:
	                pass
	    # Doing some stuff here, e. g. blinking led, no matter in most of cases
            led.value(1)
            time.sleep(0.5)
            led.value(0)
            time.sleep(0.5)
	except Exception as e:
	    print(e)
            return

 _thread.start_new_thread('my_awesome_thread', my_thread, ('some args',))
gc.collect()

Expected behaviour Thread runs simultaneously while gc.collect() happens without any errors.

Real behaviour Core panic, backtrace is shown, CPU is halted.

Backtrace

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400f7de7  PS      : 0x00060130  A0      : 0x800ef2eb  A1      : 0x3ffe5150  
A2      : 0x00000000  A3      : 0x3ffc93a0  A4      : 0x3ffd61c4  A5      : 0x00000000  
A6      : 0x3ffd6190  A7      : 0x000000b1  A8      : 0x800f7d54  A9      : 0x3ffe5130  
A10     : 0x00000000  A11     : 0x3f41175c  A12     : 0x3f413548  A13     : 0x3f413b48  
A14     : 0x3f413b48  A15     : 0x3ffe51e0  SAR     : 0x0000001a  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000008  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  

Backtrace: 0x400f7de7:0x3ffe5150 0x400ef2e8:0x3ffe51f0 0x400eaea9:0x3ffe5220 0x400f6400:0x3ffe5240 0x400d8815:0x3ffe52e0

CPU halted.

Stranger things

  1. When I was writing this issue i connected my ESP32 to copy latest backtrace. But when I tried to reproduce it, CPU halt did not happened. Than I re-saved same code file, tried to reproduce CPU halt and it happened.

  2. As I suspended tested thread, issue was no longer reproducable, once i tried to resume it ive got similar backtrace:

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400f7de7  PS      : 0x00060330  A0      : 0x800ef2eb  A1      : 0x3ffe5150  
A2      : 0x00000000  A3      : 0x3ffc9560  A4      : 0x3ffd6ae4  A5      : 0x00000000  
A6      : 0x3ffd6ab0  A7      : 0x000000b1  A8      : 0x800f7d54  A9      : 0x3ffe5130  
A10     : 0x00000000  A11     : 0x3f41175c  A12     : 0x3f413548  A13     : 0x3f413b48  
A14     : 0x3f413b48  A15     : 0x3ffe51e0  SAR     : 0x0000001a  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000008  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  

Backtrace: 0x400f7de7:0x3ffe5150 0x400ef2e8:0x3ffe51f0 0x400eaea9:0x3ffe5220 0x400f6400:0x3ffe5240 0x400d8815:0x3ffe52e0

CPU halted.
  1. Got CORRUPT HEAP when listing threads running (happened one or two times):
CORRUPT HEAP: multi_heap.c:428 detected at 0x3ffe3268
abort() was called at PC 0x40090407 on core 0

Backtrace: 0x40090c4b:0x3ffdd720 0x40090da3:0x3ffdd740 0x40090407:0x3ffdd760 0x40090762:0x3ffdd780 0x40082718:0x3ffdd7b0 0x4008275c:0x3ffdd7e0 0x40082d81:0x3ffdd800 0x4000beaf:0x3ffdd820 0x40085a29:0x3ffdd840 0x40154877:0x3ffdd860 0x4015ec05:0x3ffdd8a0 0x400890b5:0x3ffdd920

CPU halted.

Notes IDE used: Thonny

ELF file attached

MicroPython.zip

Isopodus avatar Sep 29 '19 08:09 Isopodus

Project no longer supported, apparently. I experimented with the thread code recently and it is working for me, a loop not too different from yours. What I saw initially was a stack overflow. When I boosted the stack size ( _thread.stack_size(10*1024) ) things worked fine.

A couple of things you might try;

  • the default stack size is very small, increase it.
  • put the "led = Pin(2, 2)" statement prior to the loop so it doesn't happen over and over
  • put the gc.collect() inside the loop in the thread instead of outside the thread

carterw avatar Sep 29 '19 15:09 carterw

Great advice Bill!

increasing the stack size is what solved my issues with using threads (one of the main reasons I started using this fork of uP in the first case).

Am 29.09.2019 um 17:21 schrieb Bill Carter [email protected]:

 Project no longer supported, apparently. I experimented with the thread code recently and it is working for me, a loop not too different from yours. What I saw initially was a stack overflow. When I boosted the stack size ( _thread.stack_size(10*1024) ) things worked fine.

A couple of things you might try;

the default stack size is very small, increase it. put the "led = Pin(2, 2)" statement prior to the loop so it doesn't happen over and over put the gc.collect() inside the loop in the thread instead of outside the thread — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

chmondkind avatar Sep 29 '19 17:09 chmondkind

Great, thanks for the advices, I will try them later today

Isopodus avatar Sep 30 '19 05:09 Isopodus

I think this error is related to the issue:

The following simple code causes a halted cpu too:

from microWebSrv import MicroWebSrv
import _thread
import gc

mws = MicroWebSrv() # TCP port 80 and files in /flash/www
mws.Start()         # Starts server in a new thread

gc.collect()

results in:

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400f075a  PS      : 0x00060030  A0      : 0x800eeaa8  A1      : 0x3ffe6200
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x3ffba724  A5      : 0x00000000
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3ffe61e0
A10     : 0x3ffe6240  A11     : 0x00000019  A12     : 0x3ffb6184  A13     : 0x00000000
A14     : 0x00000000  A15     : 0x3ffc51c0  SAR     : 0x00000000  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000

Backtrace: 0x400f075a:0x3ffe6200 0x400eeaa5:0x3ffe6220 0x400faa75:0x3ffe6240 0x400d9b11:0x3ffe62e0

CPU halted.

klauweg avatar Oct 01 '19 22:10 klauweg

Could well be. I am invoking the MicroWebCli in a thread and that required a larger stack.

carterw avatar Oct 02 '19 01:10 carterw

Unfortunately, increasing stack size did not help. Any other suggestions?

Isopodus avatar Oct 02 '19 06:10 Isopodus

I think this error is related to the issue:

The following simple code causes a halted cpu too:

from microWebSrv import MicroWebSrv
import _thread
import gc

mws = MicroWebSrv() # TCP port 80 and files in /flash/www
mws.Start()         # Starts server in a new thread

gc.collect()

results in:

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400f075a  PS      : 0x00060030  A0      : 0x800eeaa8  A1      : 0x3ffe6200
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x3ffba724  A5      : 0x00000000
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3ffe61e0
A10     : 0x3ffe6240  A11     : 0x00000019  A12     : 0x3ffb6184  A13     : 0x00000000
A14     : 0x00000000  A15     : 0x3ffc51c0  SAR     : 0x00000000  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000

Backtrace: 0x400f075a:0x3ffe6200 0x400eeaa5:0x3ffe6220 0x400faa75:0x3ffe6240 0x400d9b11:0x3ffe62e0

CPU halted.

Seems MicroWebSrv uses thread inside, so gc.collect() causes CPU halt too. I wonder why FTP or Telnet server of this firmware do not conflict with gc.collect(). When I list all the threads running, FTP and Telnet show up as SERVICE, MainThread is shown as MAIN, and any Python thread that I try to run is called PYTHON. Maybe we need to find out how to start new thread as SERVICE? Perhaps this won't work, because Telnet and FTP modules are written in C.

Isopodus avatar Oct 02 '19 06:10 Isopodus

At least the MicroWebServer is useless at the moment. As soon as a garbage collection happens, the cpu is halted. Until now i was unable to reproduce the error with other threads apart from the microwebserver.

klauweg avatar Oct 02 '19 22:10 klauweg

Has there been any progress with the issue? Currently this is preventing me from using the loboris port.

romnan avatar Dec 16 '19 12:12 romnan

Sadly I got no progress with it, reply if you will get any good results

Isopodus avatar Dec 17 '19 07:12 Isopodus

Just go with asyncio.

curlyz avatar May 17 '20 02:05 curlyz

Maybe the watchdog needs to be fed in the loop...

ijustwant avatar May 18 '20 07:05 ijustwant

asyncio was super slow when I tested it (communication via sockets). But may have been due to my lack of experience with it. Using the pycom firmware, threads are more then 10 times faster.

romnan avatar May 18 '20 21:05 romnan

Asyncio is a collaborative threading, therefore, you need to tune every thread that make it work together nicely. Btw, may I ask where did you get the pycom firmware ?

curlyz avatar May 20 '20 05:05 curlyz

Here is the link to the latest Firmware: https://software.pycom.io/findupgrade?product=strict=true&pycom-firmware-updater&type=stable&platform=win32&redirect=true

romnan avatar May 20 '20 11:05 romnan