manticore icon indicating copy to clipboard operation
manticore copied to clipboard

Manticore sometimes hangs when a worker exits unexpectedly

Open bradlarsen opened this issue 5 years ago • 4 comments

Sometimes Manticore hangs indefinitely when one of its workers exits unexpectedly, such as with an out-of-memory error.

For example, I saw this backtrace when running python -mmanticore /home/blarsen/coreutils-install/bin/expand on an Ubuntu 18.04 machine:

2020-02-20 13:09:27,165: [24106] m.c.worker:INFO: [6] Debug Concretize()
[Errno 12] Cannot allocate memory Probably too many cached expressions? visitors._cache...
2020-02-20 13:09:27,172: [24106] m.c.worker:ERROR: Exception in state 6: Z3NotFoundError()
Traceback (most recent call last):
  File "/home/blarsen/manticore/manticore/native/state.py", line 29, in execute
    result = self._platform.execute()
  File "/home/blarsen/manticore/manticore/platforms/linux.py", line 2438, in execute
    self.current.execute()
  File "/home/blarsen/manticore/manticore/native/cpu/abstractcpu.py", line 969, in execute
    raise ConcretizeRegister(self, "PC", policy="ALL")
manticore.native.cpu.abstractcpu.ConcretizeRegister: (<manticore.native.cpu.x86.AMD64Cpu object at 0x7f5587a62640>, 'PC')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/blarsen/manticore/manticore/core/worker.py", line 121, in run
    current_state.execute()
  File "/home/blarsen/manticore/manticore/native/state.py", line 41, in execute
    raise Concretize(str(e), expression=expression, setstate=setstate, policy=e.policy)
manticore.core.state.Concretize

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/blarsen/manticore/manticore/core/smtlib/solver.py", line 215, in _start_proc
    self._proc = Popen(
  File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1637, in _execute_child
    self.pid = _posixsubprocess.fork_exec(
OSError: [Errno 12] Cannot allocate memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/blarsen/manticore/manticore/core/worker.py", line 142, in run
    m._fork(current_state, exc.expression, exc.policy, exc.setstate)
  File "/home/blarsen/manticore/manticore/core/manticore.py", line 361, in _fork
    solutions = state.concretize(expression, policy)
  File "/home/blarsen/manticore/manticore/core/state.py", line 297, in concretize
    vals = self._solver.get_all_values(
  File "/home/blarsen/manticore/manticore/core/smtlib/solver.py", line 451, in get_all_values
    self._reset(temp_cs.to_string(related_to=var))
  File "/home/blarsen/manticore/manticore/core/smtlib/solver.py", line 294, in _reset
    self._start_proc()
  File "/home/blarsen/manticore/manticore/core/smtlib/solver.py", line 229, in _start_proc
    raise Z3NotFoundError  # TODO(mark) don't catch this exception in two places
manticore.exceptions.Z3NotFoundError

At this point, the top-level Manticore process was forever waiting, until I forcibly killed it. It would be much preferable if Manticore didn't hang.

I have seen this kind of hang other times when a worker has exited uncleanly, not just for out-of-memory reasons. Let me see if I can produce a small, reliable reproducer of this hanging behavior.

bradlarsen avatar Feb 20 '20 21:02 bradlarsen

Note additionally that the Z3NotFoundError is spurious: Z3 is definitely found.

bradlarsen avatar Feb 20 '20 21:02 bradlarsen

It looks like the top-level Manticore process is getting stuck somewhere in this loop:

https://github.com/trailofbits/manticore/blob/32f79f231b807465aff614d805af2917feb7c08e/manticore/core/worker.py#L74-L180

bradlarsen avatar Feb 20 '20 21:02 bradlarsen

There is not heartbeat implemented to check if a worker hangs (when using multiprocessing).

feliam avatar Feb 20 '20 21:02 feliam

@bradlarsen can you check if this helps: https://github.com/trailofbits/manticore/pull/2450 ?

gustavo-grieco avatar Jun 10 '21 16:06 gustavo-grieco