kwiver icon indicating copy to clipboard operation
kwiver copied to clipboard

Segfault in simple python pipeline

Open Erotemic opened this issue 8 years ago • 0 comments

I encountered a segfault in a python pipeline I developed that addressed a stereo vision problem. When debugging I reduced it to a minimal working example. The MWE pipeline simply creates a process MakeDOSProcess that creates an empty DetectedObjectSet and then pushes it to another MeasureDOSProcess process that reads it and does nothing.

Because this was originally a stereo problem, there were originally multiple MakeDOSProcess processes that fed into the MeasureDOSProcess. In the script there is a global variable N_SOURCE_NODES that defines the "number of cameras", which is basically the number of MakeDOSProcess that feed into the MeasureDOSProcess. The segfault occurs even when there is just one of these processes, but it often occurs faster if there are multiple.

The file with the MWE can be found here:

https://github.com/Erotemic/VIAME/blob/dev/camtrawl/plugins/camtrawl/python/segfault_pipeline.py

It uses a helper script to build the actual pipeline file: https://github.com/Erotemic/VIAME/blob/dev/camtrawl/plugins/camtrawl/python/define_pipeline.py (it works without this, but its a pain to keep changing the pipeline text when I could just do it dynamically).

For reference here is the pipeline file for when N_SOURCE_NODES=1

# nodes
#

process node0_make_dos
  :: make_dos

process measure
  :: measure_dos

# ----------------------
# connections
#

connect from node0_make_dos.detected_object_set
        to   measure.detected_object_set0

# ----------------------
# global pipeline config
#

config _scheduler
    :type pythread_per_process
config _pipeline:_edge
    :capacity 1

When debugging the issue, it seems to fail in some boost call, but its hard to be sure because I'm debugging with a release version of python. However, when I build a debug version (without pymalloc) boost wont link to it. It may be the case that switching to PyBind11 fixes this issue, so I will debug further once that lands.

Its also possible this is an issue with the vital bindings, but my money is on something in one of the boost-python wrapped sprokit modules.

TODO

  • [x] verify this occurs on multiple machines in multiple branches of kwiver
  • [ ] debug with pybind11
  • [ ] reproduce without using DetectedObjectSet
  • [ ] debug with debug version of python (without pymalloc)

Erotemic avatar Sep 19 '17 14:09 Erotemic