angrop icon indicating copy to clipboard operation
angrop copied to clipboard

updated timeout to run in thread to fix unsafe exception handling

Open astewart-bah opened this issue 1 year ago • 2 comments

When trying to make a chain for a specific binary, we 100% reliably get this exception:

[angrop] Timeout
Exception ignored in: <function AstRef.__del__ at 0x7ffff41f5090>
Traceback (most recent call last):
  File "/dev/shm/.venv/lib/python3.10/site-packages/z3/z3.py", line 352, in __del__
    Z3_dec_ref(self.ctx.ref(), self.as_ast())
  File "/dev/shm/.venv/lib/python3.10/site-packages/z3/z3core.py", line 1542, in Z3_dec_ref
    def Z3_dec_ref(a0, a1, _elems=Elementaries(_lib.Z3_dec_ref)):
  File "/dev/shm/.venv/lib/python3.10/site-packages/angrop/rop_utils.py", line 312, in handler
    raise RopException("[angrop] Timeout!")
angrop.errors.RopException: [angrop] Timeout!

This is an angrop timeout which is interrupting some z3 code to throw an exception.

What's specifically happening here is that the z3 code that's being interrupted to inject a RopException is a __del__ method.

Here's an excerpt from: https://docs.python.org/3/reference/datamodel.html#object.__del__

Warning
Due to the precarious circumstances under which __del__() methods are invoked, exceptions that occur during their execution are ignored, and a warning is printed to sys.stderr instead.

According to Python documentation, when an Exception is raised under a __del__ method, the Exception is ignored and execution continues.

So, in our specific scenario, the timeout fails and analysis continues until you run out of RAM and the OOM Killer kills your python process.

I've included a script called orig_timeout.py to demonstrate this. The script calls busy_function() which takes about 3 seconds to run. I copied the angrop timeout decorator into the script and forced the busy_function to have a 1 second timeout. You can see from running the script that busy_function() receives the timeout exception at the 1 second mark, but since it's executing a __del__ method, the exception is ignored and busy_function continues executing until it is finished.

Running 1/5
Running 2/5
[angrop] Timeout
Exception ignored in: <function FakeZ3.__del__ at 0x7ffff731a5f0>
Traceback (most recent call last):
  File "orig_timeout.py", line 34, in __del__
    time.sleep(0.5)
  File "orig_timeout.py", line 13, in handler
    raise RopException("[angrop] Timeout!")
__main__.RopException: [angrop] Timeout!
Running 3/5
Running 4/5
Running 5/5
The timeout-decorated function took 2.906207323074341 seconds to execute.

I've also included another script called new_timeout.py. This is a proposed new timeout solution. It kicks off the decorated function in a thread. When the timeout fires, a RopException is remotely raised in the target thread. If the target thread is currently executing a __del__ method, the exception will be ignored (can't get around this behavior). However, we can see the thread is still running, so the timeout decorator will send RopExceptions to the target thread every 100ms until the exception takes effect in the target thread.

Running 1/5
Running 2/5
[angrop] Timeout
Exception ignored in: <function FakeZ3.__del__ at 0x7ffff7341ea0>
Traceback (most recent call last):
  File "new_timeout.py", line 51, in __del__
    time.sleep(0.5)
__main__.RopException:
The timeout-decorated function took 1.2032785415649414 seconds to execute.

py_files.zip

astewart-bah avatar May 05 '24 01:05 astewart-bah

Thank you so much for providing the reproducer. I am able to reproduce the bug and create a potential fix. Can you please check whether https://github.com/angr/angrop/pull/109 fixes your testcase?

Kyle-Kyle avatar May 10 '24 20:05 Kyle-Kyle

Thank you so much for providing the reproducer. I am able to reproduce the bug and create a potential fix. Can you please check whether #109 fixes your testcase?

That is a way cleaner approach. Thank you for taking a look at the issue.

That did fix some of the issues we were seeing. However, we are still seeing the crashing behavior. Seems to be due to the fact that the Python mechanism that causes exceptions to be ignored by del also occurs in other places, to include weakref objects. Currently working on a reproducible example.

astewart-bah avatar May 11 '24 01:05 astewart-bah