mongoose-os
mongoose-os copied to clipboard
Update to be thread safe and add multi-core support (ie ESP32)
Mongoose OS only currently supports running ESP32 on single core, basically leaving the second core completely unused.
Yes i've seen the thread at: https://forum.mongoose-os.com/index.php?p=/discussion/comment/2356/#Comment_2356
And ticket #253
I opened this issue as technically this is an issue that only single core devices are supported, and multi-core devices that are supported ... only a single core is supported on them.
By closing ticket #253 does that mean that Cesanta has no plans for ever supporting multi-core devices?
While I understand that currently MOS is not thread safe, eventually over time one would assume that new devices will become available, many of them will have multi-core ... and even if it's not an ESP32, i'm sure you guys will want to add new devices ... but does that mean that even though support may be added, that they will be crippled by only supporting a single core?
I haven't read this fully but it seems like the esp-idf supports the two threads... they could at least get two cores running and pin all the mongoose functions to one core. This could possible let a user add additional independent task to the 2nd core to start with?
https://docs.espressif.com/projects/esp-idf/en/latest/api-guides/freertos-smp.html
Then again, this may go back to making mongoose thread safe as the first step and let users create their own task with less worry about threading and interacting with mongoose in those threads
I'm also very interested in this.
At very least it would be very helpfull to have thread-safe LOG() macro
it is, actually. logging is thread-safe. the only problem is sometimes the task it's logging from doesn't have enough stack... *printf calls are quite expensive in terms of stack.
@rojer Do you think the stack running out is the case in the following backtrace? I can't really tell.
Reading symbols from /fw.elf...done.
Remote debugging using 127.0.0.1:1234
0x400880df in invoke_abort ()
at /opt/Espressif/esp-idf/components/esp32/panic.c:156
156 *((int *) 0) = 0;
#0 0x400880df in invoke_abort ()
at /opt/Espressif/esp-idf/components/esp32/panic.c:156
#1 0x40088264 in abort ()
at /opt/Espressif/esp-idf/components/esp32/panic.c:171
#2 0x400828ee in lock_acquire_generic (lock=0x3ffae9d8, delay=4294967295,
mutex_type=4 '\004')
at /opt/Espressif/esp-idf/components/newlib/locks.c:143
#3 0x40082a0c in _lock_acquire_recursive (lock=0x3ffae9d8)
at /opt/Espressif/esp-idf/components/newlib/locks.c:171
#4 0x4014e389 in _fwrite_r (ptr=0x3ffc8d58, buf=<optimized out>, size=1,
count=24, fp=0x3ffae980) at ../../../.././newlib/libc/stdio/fwrite.c:168
#5 0x4014e404 in fwrite (buf=0x3ffc8b40, size=1, count=24, fp=0x3ffae980)
at ../../../.././newlib/libc/stdio/fwrite.c:211
#6 0x4017b2a0 in cs_log_print_prefix (level=LL_WARN, file=<optimized out>,
ln=<optimized out>) at src/common/cs_dbg.c:86
#7 0x400850b0 in siterRF_RX_cb (pqh=<optimized out>, cb_arg=<optimized out>)
at /home/harvie/Temp/firmware-github/display-lvgl/deps/siterRF/src/siterRF.cpp:210
#8 0x40084cec in pq_task (arg=0x3ffb634c <siter_pqh>)
at /home/harvie/Temp/firmware-github/display-lvgl/deps/pq/src/pq.c:33
Detaching from program: /fw.elf, Remote target
i am using stack size of (MGOS_TASK_STACK_SIZE_BYTES / STACK_SIZE_UNIT) i've even tried to increase it 10 times and it still crashes the same way
Ok, sorry. The problem was in directly calling printf()
rather than LOG()
. Unfortunately i need printf() for debugging, since LOG is too slow for my purposes :-/ But i guess i will have to figure this out...
No. I take that back. It actualy happens in " cs_log_print_prefix() " call...
if (!xPortCanYield()) {
/* In ISR Context */
if (mutex_type == queueQUEUE_TYPE_RECURSIVE_MUTEX) {
abort(); /* recursive mutexes make no sense in ISR context */
}
code around line 143. looks like you're trying to log from an ISR handler? this is not going to work.
from ISR you can use mgos_cd_printf that will print to UART
Just checked. I am not loging from ISR handler...
But FreeRTOS scheduler does switch between tasks in ISR, so one task preempting others tasks is basicaly ISR for the other tasks.
Now after some testing i am pretty sure that this problem happens when i call LOG(...) from freertos task other than main mongoose task. probability is increased when running on the same freertos priority as main mongoose task and 1000Hz tickrate, since context switches are more frequent that way due to time slicing. Perhaps then there is more chance that two tasks call LOG() at once.
yet, somehow xPortCanYield() ends up being false. is the trace always the same as above?
Yes
I have GPIO ISR handler, which sends job to queue which is later handled by separate task. So it might be possible that after the GPIO ISR freertos scheduler immediately jumps to different task which is receiving the queue, because it gets unblocked by that ISR sending to queue.
To be clear, in the ISR i am using this macro after sending job to the queue if queue gives me *pxHigherPriorityTaskWoken == true.
#define YIELD_FROM_ISR(should_yield) \
{ \
if (should_yield) portYIELD_FROM_ISR(); \
}
I think it would make sense for mongoose loging to use FreeRTOS stream buffers https://www.freertos.org/RTOS-stream-buffer-example.html
That way it would be possible to log to such buffer from anywhere in code. Multiple tasks and ISRs. Stream buffer would then synchronize everything to make it thread safe.
BTW what about mgos_azure_send_d2c_msg()
(and respective mqtt api)? Is that thread safe? I am afraid that it will have same issues as LOG()...
Until it gets fixed, i came up with temporary workaround:
#include "FreeRTOS.h"
#include "task.h"
#include "mgos_core_dump.h" //mgos_cd_printf() prototype
#include "common/cs_dbg.h"
#ifndef __FILENAME__
/// We don't want to log full paths, so we only use the filename
#define __FILENAME__ (strrchr("/" __FILE__, '/') + 1)
#endif //__FILENAME__
/**
* mgos macro `LOG()` is not thread safe, so we have this instead.
* unfortunately it can currently log only to UART...
*/
#define PQ_LOG(l, x) \
do { \
if(cs_log_level >= l) { \
mgos_cd_printf("%.24s:%d:%.16s:\t\t", __FILENAME__, __LINE__, pcTaskGetTaskName(NULL)); \
mgos_cd_printf x; \
mgos_cd_printf("\n"); \
} \
} while (0)
(unfortunately this does not play very well with power management enabled https://github.com/cesanta/mongoose-os/issues/556 )
From what i read here https://github.com/espressif/esp-idf/issues/4838 it feels like the 115200 should work with REF_TICK.
So perhaps we should try it and always use REF_TICK unless user specifies higher baudrate?
BTW what is expected behaviour of LOG() in ISR? It should probably look like this:
LOG mutex is unlocked, ISR can LOG() LOG mutex is locked, ISR cannot LOG() and therefore should ignore the LOG() call
Even better version:
There is dedicated buffer for ISRs to LOG into. It is dumped to LOG() as soon as it gets unlocked.
What currently happens is
No matter if LOG is locked or not, calling LOG() in ISR causes panic
LOG is not meant to be used in ISR.
On Tue 27 Oct 2020, 09:03 Tomas Mudrunka, [email protected] wrote:
BTW what is expected behaviour of LOG() in ISR? It should probably look like this:
LOG mutex is unlocked, ISR can LOG() LOG mutex is locked, ISR cannot LOG() and therefore should ignore the LOG() call
What currently happens is
No matter if LOG is locked or not, calling LOG() in ISR causes panic
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cesanta/mongoose-os/issues/480#issuecomment-717093511, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEBW2YJSWHBVCMK4KJDIH3SM2EHNANCNFSM4GX2J5MA .
LOG is not meant to be used in ISR.
Ok, that is not really important for me. Just saying it can be done if needed :-) I will try to write some minimal code needed to reproduce the bug i experience when calling LOG() from multiple tasks, so you can take a look at it.
I am still strugling to isolate the bug. Might mgos_ints_disable() / mgos_ints_enable()
have effect on triggering of this bug by any chance?
O yeah! I got it. Happens when calling LOG() with ints disabled. At least in some cases...
Remote debugging using 127.0.0.1:1234
0x4008727b in invoke_abort ()
at /opt/Espressif/esp-idf/components/esp32/panic.c:156
156 *((int *) 0) = 0;
#0 0x4008727b in invoke_abort ()
at /opt/Espressif/esp-idf/components/esp32/panic.c:156
#1 0x40087400 in abort ()
at /opt/Espressif/esp-idf/components/esp32/panic.c:171
#2 0x40082802 in lock_acquire_generic (lock=0x3ffae9d8, delay=4294967295,
mutex_type=4 '\004')
at /opt/Espressif/esp-idf/components/newlib/locks.c:143
#3 0x40082920 in _lock_acquire_recursive (lock=0x3ffae9d8)
at /opt/Espressif/esp-idf/components/newlib/locks.c:171
#4 0x4013f59d in _fwrite_r (ptr=0x3ffc8380, buf=<optimized out>, size=1,
count=24, fp=0x3ffae980) at ../../../.././newlib/libc/stdio/fwrite.c:168
#5 0x4013f618 in fwrite (buf=0x3ffc8220, size=1, count=24, fp=0x3ffae980)
at ../../../.././newlib/libc/stdio/fwrite.c:211
#6 0x40169f00 in cs_log_print_prefix (level=LL_INFO, file=<optimized out>,
ln=<optimized out>) at src/common/cs_dbg.c:86
#7 0x400e6c06 in task_cb (pvParameters=<optimized out>)
at /home/harvie/Temp/firmware-github/log-crash-test/src/main.c:21
Detaching from program: /fw.elf, Remote target
i bet it can. mgos_ints_disable()
translates to portENTER_CRITICAL
on freertos, you shouldn't be logging from inside a critical section.
Can you please add some asserts checking if we are in critical section, so this becomes immediately obvious when somebody runs to this limitation again?
As well as puting big warning to LOG documentation https://mongoose-os.com/docs/mongoose-os/api/core/cs_dbg.h.md that it should not be called from ISR or with disabled interrupts.
//Isolated case of code causing the crash :-)
#include "mgos.h"
enum mgos_app_init_result mgos_app_init(void)
{
mgos_ints_disable();
LOG(LL_INFO,("Log from task %d", 456));
mgos_ints_enable();
return MGOS_APP_INIT_SUCCESS;
}
well, there are a lot of things you shouldn't be doing with ints disabled, logging is just one fo them - in fact, the set of thinjgs you can safely do with ints disabled is very limited. i guess it's worth documenting somewhere, though
Anyway. I am satisfied with this conclusion, it is no problem to move LOG() out of the critical section. if i only knew week ago... I might have save myself lots of headache :-)
BTW what about stuff like MQTT? Can that be considered thread safe? What parts of mongoose cause biggest troubles regarding thread safety? So far stuff seems workable.
Hopefully i will soon release my module implementing parallel event loops, works well, just needs bit of polishing... Stay tuned :)