robot_dart
robot_dart copied to clipboard
OpenGL parallel contexts with shadows segfault (sometimes)
When rendering shadows in multiple OpenGL parallel contexts, we sometimes get segfaults. Need to investigate this. Without the shadows, this does not happen. Example backtrace:
(gdb) bt
#0 0x00007ffe42c6b485 in () at /usr/lib/libnvidia-glcore.so.465.31
#1 0x00007ffff7b60610 in () at /usr/lib/libMagnumGL.so.2
#2 0x00007ffff7b5f5c1 in Magnum::GL::Mesh::drawInternal(int, int, int, unsigned int, long, int, int) () at /usr/lib/libMagnumGL.so.2
#3 0x00007ffff7b47258 in Magnum::GL::AbstractShaderProgram::draw(Magnum::GL::Mesh&) () at /usr/lib/libMagnumGL.so.2
#4 0x00005555555ba654 in robot_dart::gui::magnum::ShadowedObject::draw(Magnum::Math::Matrix4<float> const&, Magnum::SceneGraph::Camera<3u, float>&) (this=0x7ffe34cb1b10, transformationMatrix=..., camera=...)
at ../src/robot_dart/gui/magnum/drawables.cpp:176
#5 0x00007ffff7c6ca7f in Magnum::SceneGraph::Camera<3u, float>::draw(Magnum::SceneGraph::FeatureGroup<3u, Magnum::SceneGraph::Drawable<3u, float>, float>&) () at /usr/lib/libMagnumSceneGraph.so.2
#6 0x00005555555b4127 in robot_dart::gui::magnum::BaseApplication::render_shadows() (this=0x7ffe343c1480) at ../src/robot_dart/gui/magnum/base_application.cpp:574
#7 0x00005555555b630d in robot_dart::gui::magnum::BaseApplication::update_lights(robot_dart::gui::magnum::gs::Camera const&) (this=0x7ffe343c1480, camera=<optimized out>) at ../src/robot_dart/gui/magnum/base_application.cpp:276
#8 0x00005555555db14c in robot_dart::gui::magnum::sensor::Camera::calculate(double) (this=0x7ffe37883300) at /usr/include/c++/11.1.0/bits/unique_ptr.h:173
#9 0x00005555555e0c53 in robot_dart::RobotDARTSimu::step_world(bool) (this=this@entry=0x7ffe41925b60, reset_commands=reset_commands@entry=false) at ../src/robot_dart/robot_dart_simu.cpp:168
#10 0x00005555555958da in operator()(int) const (__closure=0x7fffffffd580, run=<optimized out>) at ../src/task_specific_evaluation.cpp:354
#11 0x0000555555596fcd in tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>::operator() (r=<optimized out>, r=<optimized out>, this=0x7ffe44c4fd58) at /usr/include/tbb/parallel_for.h:177
#12 tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>, const tbb::auto_partitioner>::run_body (r=<optimized out>, this=0x7ffe44c4fd40)
at /usr/include/tbb/parallel_for.h:115
#13 tbb::interface9::internal::dynamic_grainsize_mode<tbb::interface9::internal::adaptive_mode<tbb::interface9::internal::auto_partition_type> >::work_balance<tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>, const tbb::auto_partitioner>, tbb::blocked_range<int> > (range=<optimized out>, start=<optimized out>, this=<optimized out>) at /usr/include/tbb/partitioner.h:423
#14 tbb::interface9::internal::partition_type_base<tbb::interface9::internal::auto_partition_type>::execute<tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>, const tbb::auto_partitioner>, tbb::blocked_range<int> > (range=<optimized out>, start=warning: RTTI symbol not found for class 'tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main::{lambda(int)#1}, int>, tbb::auto_partitioner const>'
..., this=0x7ffe44c4fd68) at /usr/include/tbb/partitioner.h:256
#15 tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>, const tbb::auto_partitioner>::execute(void) (this=0x7ffe44c4fd40)
at /usr/include/tbb/parallel_for.h:142
#16 0x00007ffff25c5105 in () at /usr/lib/libtbb.so.2
#17 0x00007ffff25c543c in () at /usr/lib/libtbb.so.2
#18 0x00007ffff25bed97 in () at /usr/lib/libtbb.so.2
#19 0x00007ffff25bd3e1 in () at /usr/lib/libtbb.so.2
#20 0x00007ffff25b981c in () at /usr/lib/libtbb.so.2
#21 0x00007ffff25b9a8a in () at /usr/lib/libtbb.so.2
#22 0x00007ffff6064259 in start_thread () at /usr/lib/libpthread.so.0
#23 0x00007fffe6c585e3 in clone () at /usr/lib/libc.so.6
This is most probably because GPU memory is not enough to handle all the parallel contexts (and shadows DO take a lot of GPU memory).
I might want to check GL::Renderer::error() for out of memory messages, but this is not guaranteed to fire every time.