allegro5 icon indicating copy to clipboard operation
allegro5 copied to clipboard

Tests segmentation fault

Open PureTryOut opened this issue 3 years ago • 22 comments

While running the unit tests (tests/test_driver --no-display ../tests/test_*.ini) I'm getting segfaults on Alpine Linux.

The build log can be found here, I can not fully see on what test it's failing exactly.

PureTryOut avatar Mar 18 '21 15:03 PureTryOut

Last line is

OK   test filled textured subbmp dest clipSegmentation fault (core dumped)

so presumably that one or the one right after it (or is that naive?)

Do you regularly run these tests, if so, when was the last time it passed - could help to narrow it down. (edit) and which commit are you testing here, a release or the current HEAD

pedro-w avatar Mar 18 '21 20:03 pedro-w

It's OK on Mac, haven't tested another platform yet

OK   test filled textured subbmp dest [sw] - by signature
OK   test filled textured subbmp dest clip [sw]
OK   test div-by-zero [sw] - by signature
OK   test pieslice [sw]
OK   test elliptical arc [sw]

so div-by-zero is the failing test, perhaps?

pedro-w avatar Mar 18 '21 20:03 pedro-w

The tests work fine with --no-display on a Debian unstable:

OK   test filled textured subbmp dest [sw] - by signature
OK   test filled textured subbmp dest clip [sw]
OK   test div-by-zero [sw] - by signature
OK   test pieslice [sw]
OK   test elliptical arc [sw]
WARNING: Skipping hardware-only test due to the --no-display flag: test projection
WARNING: Skipping hardware-only test due to the --no-display flag: test projection flipped
OK   test polyline collinear 50 [sw]
OK   test polyline triangle 50 [sw]
OK   test polyline squiggle 0 [sw]
OK   test polyline squiggle 1 [sw]

gusnan avatar Mar 18 '21 20:03 gusnan

I get this crash when running inside xvfb under Debian, not sure which test it is from but seems to be reading a picture from the bmpsuite. Could be unrelated.

#0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
#1  0x00007ffff5832ea2 in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#2  0x00007ffff5832fcc in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#3  0x00007ffff5837f51 in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#4  0x00007ffff55b9d06 in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#5  0x00007ffff5539914 in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#6  0x00007ffff553c5c0 in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#7  0x00007ffff553fa28 in ?? () from /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
#8  0x00007ffff7f4f79e in ogl_unlock_region_nonbb_nonfbo (gl_y=0, ogl_bitmap=0x5555559a9460, bitmap=0x5555559b1180) at /home/allefant/allegro/git/src/opengl/ogl_lock.c:684
#9  ogl_unlock_region_non_readonly (ogl_bitmap=0x5555559a9460, bitmap=0x5555559b1180) at /home/allefant/allegro/git/src/opengl/ogl_lock.c:499
#10 _al_ogl_unlock_region_new (bitmap=0x5555559b1180) at /home/allefant/allegro/git/src/opengl/ogl_lock.c:431
#11 0x00007ffff7ed7183 in al_unlock_bitmap (bitmap=bitmap@entry=0x5555559b1180) at /home/allefant/allegro/git/src/bitmap_lock.c:147
#12 0x00007ffff7fc0fd9 in _al_load_bmp_f (f=f@entry=0x555555a06e10, flags=flags@entry=512) at /home/allefant/allegro/git/addons/image/bmp.c:1549
#13 0x00007ffff7fc1fb9 in _al_load_bmp (filename=<optimized out>, flags=512) at /home/allefant/allegro/git/addons/image/bmp.c:1649
#14 0x000055555555aad3 in load_relative_bitmap (flags=512, filename=0x55555590cd70 "bmpsuite/g08s0.bmp") at /home/allefant/allegro/git/tests/test_driver.c:170
#15 load_bitmaps (cfg=cfg@entry=0x5555558865d0, bmp_type=bmp_type@entry=HW, flags=512, section=0x55555556243f "bitmaps") at /home/allefant/allegro/git/tests/test_driver.c:191
#16 0x0000555555558d44 in process_ini_files () at /home/allefant/allegro/git/tests/test_driver.c:1737
#17 main (_argc=<optimized out>, _argv=<optimized out>) at /home/allefant/allegro/git/tests/test_driver.c:1874

allefant avatar Mar 18 '21 20:03 allefant

Yeah I'm running it with xvfb as well. Last time it worked fine was 5.2.6.0, it's failing now on 5.2.7.0.

PureTryOut avatar Mar 19 '21 08:03 PureTryOut

Also works fine with Allegro 5.2.7.0 release version, on Alpine 3.13.2 (x86_64) running in VirtualBox. This is very strange.

pedro-w avatar Mar 19 '21 09:03 pedro-w

Works on 3.13.2? Huh... Could you give it a shot on edge (where it's failing for me)?

PureTryOut avatar Mar 19 '21 10:03 PureTryOut

I get it failing on Debian unstable, using Intel graphics. My previous report where it worked above was using Virtualbox. It might be this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=761163

Please try running the tests with something like:

LIBGL_DRI3_DISABLE=1 tests/test_driver ../tests/test_*.ini

which might make it work (and it signals that it's a problem with DRI).

I see the problem on a Lenovo X220i, seems to be i915 intel graphics.

gusnan avatar Mar 19 '21 14:03 gusnan

@PureTryOut I tried on the 'edge' version, no problems. I'm running everything as root, could that make a difference? (still in VirtualBox) @gusnan when you run it on the X220i, are you still using --no-display ?

pedro-w avatar Mar 19 '21 14:03 pedro-w

This is in a clean container (abuild rootbld) which I think runs everything as root in there as well. It's as minimal as can be so maybe it's missing some GL drivers?

I tried LIBGL_DRI3_DISABLE=1 but it doesn't seem to make a difference.

It seems the place it's segfaulting on is a bit random. Right now all tests complete successfully but it then still segfaults in the end.

total tests:  344
passed tests: 344
failed tests: 0
skipped tests: 26

Segmentation fault

PureTryOut avatar Mar 19 '21 15:03 PureTryOut

@pedro-w when using --no-display on the X220i it doesn't segfault.

gusnan avatar Mar 19 '21 15:03 gusnan

@PureTryOut I can try with abuild but since I've never used it (or even heard of it actually) until now I might need some help! It looks like it needs an APKBUILD script to work on, is that available somewhere?

pedro-w avatar Mar 24 '21 08:03 pedro-w

Yes, here although it is for 5.2.6.0 currently. Put it in it's own folder called "allegro", update the pkgver to 5.2.7.0 and run abuild checksum in that folder to update the checksums. Then run abuild rootbld in that folder to build the package. The end result (if everything succeeds) is an Alpine Linux package.

Although you can probably just run the regular build commands on a Alpine system or container as well, I doubt the issue is specific to abuild.

PureTryOut avatar Mar 24 '21 09:03 PureTryOut

OK. I think I have made some progress but these segfaults are inconsistent as to when they appear, so this might not be correct. I built using abuild with APKBUILD modified to 5.2.7 as you suggest, and got a segfault. I also got one with the unmodified file (i.e. 5.2.6) Going back to a 'manual build' - i.e. getting the tar file, then cmake, make and test I didn't get any segfault. In the APKBUILD I notice some odd settings - is there a reason for these?

  • -DCMAKE_BUILD_TYPE=None (line 33)
  • -DALLEGRO_SDL=ON (line 35) When I made these settings via ccmake in my manual build, I did see a segfault and the traceback was in sdl_shutdown_system, which calls SDL_Quit (no debug symbols for SDL unfortunately!) VirtualBox_Alpine_24_03_2021_10_51_34 Apologies for the image, I can't copy text from VirtualBox. Could you try with setting CMAKE_BUILD_TYPE to RelWithDebInfo and not setting ALLEGRO_SDL? That might narrow it down. Probably @allefant is the one to advise on the SDL back-end.

pedro-w avatar Mar 24 '21 13:03 pedro-w

Also - with CMAKE_BUILD_TYPE=None I needed to run test_driver via xvfb_run but with CMAKE_BUILD_TYPE=RelWithDebInfo I didn't. xvfb_run is another thing I'd never heard of until today.

pedro-w avatar Mar 24 '21 13:03 pedro-w

Hmm, -DALLEGRO_SDL=ON seems to be it. If I remove it (and thus turn it off), tests succeed. Changing -DCMAKE_BUILD_TYPE didn't change anything and having it at None but without SDL succeeds as well. As for why we change the build type to None: to make sure it gets build with our distribution compiler flags rather than whatever upstream might have set.

xvfb-run is a tool to run GUI tests in headless environments. It's weird that it's needed when --no-display is used though, but that only seems to be the case when it's compiled with SDL. The build type doesn't matter here.

Why is compiling with SDL2 considered odd exactly?

PureTryOut avatar Mar 24 '21 13:03 PureTryOut

I managed, with abuild, to install a version of SDL without the debug symbols stripped out. I rebuilt Allegro and ran the tests; they crashed twice and then I could never repeat it. I did get a backtrace though (see image) VirtualBox_Alpine_30_03_2021_13_22_24 The crash could happen maybe if there is a race condition and something else shuts down the UDEV system whilst the joystick driver is in the middle of unregistering its callback. I'm not really sure, @allefant wrote the original code and I'd appreciate his comments if he's around?

Why is compiling with SDL2 considered odd exactly?

It shouldn't be necessary to use SDL, it acts to replace Allegro's own low-level code and can be useful if there's a platform that SDL supports and Allegro doesn't, e.g. emscripten. 'Odd' probably wasn't the right word to use there, sorry.

In your set-up, does it reliably crash every time? We could look to putting some more checks in place maybe.

pedro-w avatar Mar 31 '21 09:03 pedro-w

The tests segfault reliably yes, I don´t think I have ever seen it pass with the latest version.

PureTryOut avatar Mar 31 '21 09:03 PureTryOut

The SDL port is still experimental and also not fully implemented, so tests are not expected to pass with it: https://github.com/liballeg/allegro5/blob/master/README_sdl.txt

It still would be nice finding the cause of the crash, but also it could be something that goes away once the port is completed.

allefant avatar Mar 31 '21 12:03 allefant

Good to know, I might just disable SDL in our packaging for now then. SDL is such an important part in other software that I thought it was required to be honest.

PureTryOut avatar Mar 31 '21 13:03 PureTryOut

I think you can also speed up the package build by adding -DWANT_DEMO=off and -DWANT_EXAMPLES=off to the APKBUILD, as it seems you don't install or make use of the examples or demos. Let us know how you get on. In the meantime , if there is a bug in Allegro-SDL it would be great to fix it, as allefant says.

pedro-w avatar Mar 31 '21 13:03 pedro-w

Thanks, I indeed don't need those, that'll decrease compilation times quite a bit!

I disabled SDL2 for now, thanks for the tips so far. Hopefully it can be fixed at some point, but for now this'll work for us.

PureTryOut avatar Mar 31 '21 13:03 PureTryOut