lv_binding_micropython segmentation violation on unix port

trafficstars

When trying to use the SDL driver on the unix port I get a segmentation violation: import SDL SDL.init() crashes.

Sep 22 '20 15:09 uraich

Do you have an Nvidia graphics card in your computer? The Unix port has issues with Nvidia cards which we haven't been able to track down. See #46.

Sep 22 '20 16:09 embeddedt

Hi @uraich ! Since we are unable to reproduce your problem on our side, we would need your help debugging this.
Could you provide the stack trace of the crash? You can obtain it by running Micropython under gdb. Something like gdb --args micropython ...

Sep 22 '20 18:09 amirgon

I think embeddedt gave the answer: I do have an NVidia graphics card

Sep 22 '20 18:09 uraich

I think embeddedt gave the answer: I do have an NVidia graphics card

@embeddedt Do you have an NVidia graphics card? Would you consider diving into this once again?

I can suggest the following:

Run it with valgrind, perhaps there is some memory corruption
Try to obtain the sources or at least the debug symbols of libnvidia-glcore and get a more meaningful stack trace than this one
Try to ask on nvidia forums, or contact nvidia support
Open a ticket on nvidia issue tracker
Just for the test, we can try to change the SDL driver back to use SDL thread instead of Micropython thread. I believe the original problem was related to callbacks (we want to run Micropython callback on Micropython thread) so if this is the issue we can still use SDL thread but be carefully trigger callbacks from Micropython thread.

Sep 22 '20 19:09 amirgon

This is what I see when I run lv_micropython in gdb

Sep 22 '20 21:09 uraich

Is this the same sequence of steps you ran to get it to segfault? It doesn't appear to have crashed yet.

Sep 22 '20 21:09 embeddedt

Yes, the same sequence. Without gdb I see this:

Sep 22 '20 21:09 uraich

@uraich SIGUSR1 is used internally in lv_micropython and should be ignored. Please run in gdb (before run):

handle SIGUSR1 nostop noprint pass

Sep 22 '20 21:09 amirgon

Correct!, So it is in the nvidia-glcore library

Sep 22 '20 21:09 uraich

@amirgon Yes; I have an Nvidia card.

I've been reading some documents about SDL, and it appears that in order to be compliant with its requirements, we need to ensure that all SDL rendering is handled on our initial main thread. It appears that calling SDL functions from other threads is known to cause issues.

Is SDL always invoked from a specific thread, or can it be invoked by any thread depending on what MicroPython is doing?

Sep 24 '20 00:09 embeddedt

we need to ensure that all SDL rendering is handled on our initial main thread

@embeddedt Do you mean, from the same thread that initialized SDL?

Is SDL always invoked from a specific thread, or can it be invoked by any thread depending on what MicroPython is doing?

I think that SDL is initialized and rendered from the same thread all the time.

Here is how it works:

mp_init_SDL is called from Micropython main thread when we call SDL.init() from Micropython. It calls monitor_init and initializes SDL.

https://github.com/lvgl/lv_binding_micropython/blob/6e2af53e9dc3042dfa6f9cd03e0b5c4ca7042d51/driver/SDL/modSDL.c#L73

mp_init_SDL creates a new thread tick_thread, but this thread does not do the rendering directly. It only schedules a call to Micropython:

https://github.com/lvgl/lv_binding_micropython/blob/6e2af53e9dc3042dfa6f9cd03e0b5c4ca7042d51/driver/SDL/modSDL.c#L33-L45

When Micropython is ready it performs scheduled tasks and calls mp_lv_task_handler which performs LVGL and SDL rendering:

https://github.com/lvgl/lv_binding_micropython/blob/6e2af53e9dc3042dfa6f9cd03e0b5c4ca7042d51/driver/SDL/modSDL.c#L23-L28

There is an open question here.

When Micropython performs scheduled tasks, is it doing it always from the same thread? I think it is... but just to make sure it's worth adding some printing of Thread-ID to mp_lv_task_handler.

Looking at Micropython code, it's not entirely clear.
mp_handle_pending is the function in Micropython that runs scheduled tasks, but it is called in different places, specifically by MP_HAL_RETRY_SYSCALL which itself is also called in different places

Sep 24 '20 06:09 amirgon

Do you mean, from the same thread that initialized SDL?

Unfortunately, it's even stricter than that. It looks like SDL operations always need to be done on the initial main thread (i.e. the one which main(argc, argv) runs in at the start of the program). Doing them on a single thread consistently isn't enough.

Is the "Micropython main thread" the same thread as main(argc, argv), or does MicroPython spawn its own thread initially and use that for the rest of the program's lifetime?

Sep 24 '20 11:09 embeddedt

Is the "Micropython main thread" the same thread as main(argc, argv), or does MicroPython spawn its own thread initially and use that for the rest of the program's lifetime?

Looking at main.c I don't see any explicit creation of a new thread. Also in the stack trace above it's clear that the call to SDL refresh is from the same thread main was invoked.

But to make sure, I suggest printing thread-id and checking if it's the same even when the problem happens.

Sep 24 '20 12:09 amirgon

Another idea -
@uraich - Could you try running it with gdb again until it crashes, and show the stack trace of all threads? We would be able to tell if there are other threads in the process and what they are doing.

gdb command:

thread apply all bt

Sep 24 '20 12:09 amirgon

No time to debug this right now, but assuming that Thread 1 is the main thread, it looks like we aren't violating any SDL requirements.

Thread 2 (Thread 0x7fffeffed700 (LWP 8085)):
#0  0x00007ffff7bc7c70 in __GI___nanosleep (
    requested_time=requested_time@entry=0x7fffeffece60, 
    remaining=remaining@entry=0x7fffeffece50)
    at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00007ffff71afad5 in SDL_Delay_REAL (ms=<optimized out>)
    at /tmp/SDL2-2.0.10/src/timer/unix/SDL_systimer.c:211
#2  0x0000555555653129 in ?? ()
#3  0x00007ffff71134ac in SDL_RunThread (data=0x555555d1d7e0)
    at /tmp/SDL2-2.0.10/src/thread/SDL_thread.c:283
#4  0x00007ffff71aa0a9 in RunThread (data=<optimized out>)
    at /tmp/SDL2-2.0.10/src/thread/pthread/SDL_systhread.c:79
#5  0x00007ffff7bbd6db in start_thread (arg=0x7fffeffed700)
    at pthread_create.c:463
#6  0x00007ffff6dc1a3f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ffff7faf740 (LWP 8081)):
#0  0x00007ffff1fe1447 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.66
#1  0x00007ffff1fe1ac3 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.66
--Type <RET> for more, q to quit, c to continue without paging--
#2  0x00007ffff1fb9c9e in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.66
#3  0x00007ffff1fc7e9b in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.66
#4  0x00007ffff1fd10dc in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.66
#5  0x00007ffff70f1d40 in GL_RunCommandQueue (renderer=0x555555c80d40, cmd=0x555555d1a550, vertices=0x555555d1a590, vertsize=<optimized out>) at /tmp/SDL2-2.0.10/src/render/opengl/SDL_render_gl.c:1270
#6  0x00007ffff70e9e11 in FlushRenderCommands (renderer=0x555555c80d40) at /tmp/SDL2-2.0.10/src/render/SDL_render.c:218
#7  SDL_RenderPresent_REAL (renderer=0x555555c80d40) at /tmp/SDL2-2.0.10/src/render/SDL_render.c:3130
#8  0x0000555555652f37 in ?? ()
#9  0x0000555555653169 in ?? ()
#10 0x00005555555b7954 in ?? ()
#11 0x00005555555b8d7a in ?? ()
#12 0x00005555555b8e90 in ?? ()
#13 0x00005555555d4d51 in ?? ()
#14 0x000055555565399a in ?? ()
#15 0x00005555555d4935 in ?? ()
#16 0x00007ffff6cc1b97 in __libc_start_main (main=0x5555555a537b <main>, argc=1, argv=0x7fffffffdbe8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdbd8)
    at ../csu/libc-start.c:310
#17 0x00005555555a53da in ?? ()

Sep 24 '20 12:09 embeddedt

Stupid question: If it was a problem from which thread SDL_init is called, should we then not have the same problem independently of the display driver? When I run my nvidia graphics cards with the nouveau driver, which works quite ok now, then the problem is gone.

Sep 24 '20 15:09 uraich

@uraich Thanks for testing that. This proves that the problem is likely to be somewhere in the Nvidia proprietary drivers.

If it was a problem from which thread SDL_init is called, should we then not have the same problem independently of the display driver?

Not necessarily, because SDL is a thin layer over driver-specific implementations, each of which may have their own threading constraints.

Sep 24 '20 16:09 embeddedt

This issue or pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Oct 17 '20 20:10 stale[bot]

I have an idea.

By default, the SDL driver creates a "tick" thread that calls lv_tick_inc and schedules a call to lv_task_handler. Let's check if this crash is related to this thread or not. We discussed this above but I don't think we tried to completely disable this thread.

On latest version of the SDL driver you can pass an optional parameter auto_refresh. It's True by default, but if set it to False it would not create the "tick" thread and the user would be responsibile to call lv_tick_inc and lv_task_handler. These can simply be called in a loop, or more sensibly as part of uasyncio event loop.

Something like this:

import uasyncio
from async_utils import lv_async
import lvgl as lv
import SDL
lv.init()

# Register SDL display driver, without the event loop (auto_refresh set to False)

SDL.init(auto_refresh=False)

disp_buf1 = lv.disp_buf_t()
buf1_1 = bytes(480 * 10)
disp_buf1.init(buf1_1, None, len(buf1_1)//4)
disp_drv = lv.disp_drv_t()
disp_drv.init()
disp_drv.buffer = disp_buf1
disp_drv.flush_cb = SDL.monitor_flush
disp_drv.hor_res = 480
disp_drv.ver_res = 320
disp_drv.register()

# Regsiter SDL mouse driver

indev_drv = lv.indev_drv_t()
indev_drv.init() 
indev_drv.type = lv.INDEV_TYPE.POINTER
indev_drv.read_cb = SDL.mouse_read
indev_drv.register();

# Create a screen

scr = lv.obj()
btn = lv.btn(scr)
btn.align(scr, lv.ALIGN.CENTER, 0, 0)
label = lv.label(btn)
label.set_text('Hello World!')
lv.scr_load(scr)

# Event loop

lva = lv_async(refresh_func = SDL.refresh)
uasyncio.Loop.run_forever()

Could someone with Nvidia graphics card check if the problem happens with the code above? If it doesn't - it would mean that the tick thread is very probably related to the problem.

Dec 24 '20 14:12 amirgon

@amirgon I've tried this and unfortunately the crash still happens.

Dec 24 '20 15:12 embeddedt

@amirgon I've tried this and unfortunately the crash still happens.

What about the regular (non-micropython) SDL driver? Does it crash sometimes with Nvidia? Or is this problem limited to the Micropython version of the SDL driver?

Dec 24 '20 15:12 amirgon

The normal SDL driver used by the PC simulator has never crashed for me.

Dec 24 '20 15:12 embeddedt

The normal SDL driver used by the PC simulator has never crashed for me.

That's interesting because originally (long time ago) the Micropython's SDL driver was derived from the "normal" SDL driver.
So either there is some difference between them, or something in Micropython itself it causing the problem.

Here are possible ways to check this:

Diff the Micropython SDL driver vs. the "normal" SDL driver, maybe something would pop up
Run the "normal" SDL with Micropython instead of the Micropython SDL driver. Eventually Micropython is a C program so it's possible to "hard-wire" it to the "normal" SDL driver and see what happens.

Dec 24 '20 15:12 amirgon

I swapped out the current MicroPython SDL driver for a copy of the PC one (with minor modifications) and the issue is still happening. This suggests that MicroPython is interfering with SDL's operation. The two drivers look quite similar still so I doubt the issue is in the driver.

Dec 24 '20 18:12 embeddedt

This suggests that MicroPython is interfering with SDL's operation

This is very strange. Micropython core does not do anything related to SDL or any kind of graphics. It's just a console application.

Does it happen on SDL.init()? Or later?
If it happens on SDL initialization I suggest verifying this again by taking a fresh upstream Micropython without any LVGL or Bindings code, and only add the few lines of SDL initialization code directly to the unix port "main" function, just to prove that Micropython is the culprit and there's nothing related to LVGL the Bindings or the SDL driver.

Another thing worth trying (with fresh micropython + SDL init, or with the uasyncio script above) is turning off multi-threading on Micropython. I remember there were some limitations to the SDL driver related to threads, worth checking if it's related. To turn off multi-threading set MICROPY_PY_THREAD to 0 on mpconfigport.mk, (probably will also work by simply building the unix port with make -C ports/unix MICROPY_PY_THREAD=0)

Dec 24 '20 21:12 amirgon

Results:

It does not happen on upstream MicroPython when added to main.
It also does not happen on lv_micropython when added to main in the same spot.

I will try the threading suggestion and see what happens.

Dec 24 '20 21:12 embeddedt

It also does not happen on lv_micropython when added to main in the same spot.

Does it happen when adding it in lv_init() C code instead?

Dec 24 '20 21:12 amirgon

@amirgon I was able to reproduce it on upstream MicroPython by adding in the initial calls to render the first frame (gray background).

The following test case works in a standalone C file, but fails in MicroPython's main function:

#define MONITOR_HOR_RES 480
#define MONITOR_VER_RES 272
#define MONITOR_ZOOM 1

    SDL_Init(SDL_INIT_VIDEO);

    SDL_Window * window = SDL_CreateWindow("TFT Simulator",
                              SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED,
                              MONITOR_HOR_RES * MONITOR_ZOOM, MONITOR_VER_RES * MONITOR_ZOOM, 0);       /*last param. SDL_WINDOW_BORDERLESS to hide borders*/

    SDL_Renderer * renderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_SOFTWARE);

    SDL_Texture * texture = SDL_CreateTexture(renderer,
                                SDL_PIXELFORMAT_ARGB8888, SDL_TEXTUREACCESS_STATIC, MONITOR_HOR_RES, MONITOR_VER_RES);
    SDL_SetTextureBlendMode(texture, SDL_BLENDMODE_BLEND);

    static uint32_t tft_fb[MONITOR_HOR_RES * MONITOR_VER_RES];
    memset(tft_fb, 0x44, MONITOR_HOR_RES * MONITOR_VER_RES * sizeof(uint32_t));
    SDL_UpdateTexture(texture, NULL, tft_fb, MONITOR_HOR_RES * sizeof(uint32_t));
    SDL_RenderClear(renderer);

    /*Update the renderer with the texture containing the rendered image*/
    SDL_RenderCopy(renderer, texture, NULL, NULL);
    SDL_RenderPresent(renderer);
}

We are getting somewhere, finally!

Dec 24 '20 21:12 embeddedt

The following test case works in a standalone C file, but fails in MicroPython's main function

Very interesting! Did disabling threading make any difference?

Dec 24 '20 21:12 amirgon

Didn't think to try that on upstream. Let me see.

Dec 24 '20 21:12 embeddedt

lv_binding_micropython lv_binding_micropython copied to clipboard

segmentation violation on unix port

lv_binding_micropython
lv_binding_micropython copied to clipboard