pymunk icon indicating copy to clipboard operation
pymunk copied to clipboard

Windows fatal exception: access violation - shapes.py line 65

Open Day-Go opened this issue 2 years ago • 13 comments

Experiencing irregular python interpreter crashes. Crash happens 20-30% of the time that error producing line is executed. Code is using a multiprocessing pool, dont know if that is relevant but thought it might be worth mentioning.

Error found by using pythons in-built faulthandler module. Full traceback below:

Windows fatal exception: access violation

Current thread 0x00000a28 (most recent call first):
  File "C:\Users\OolongJunSun\AppData\Local\Programs\Python\Python310\lib\site-packages\pymunk\shapes.py", line 65 in shapefree
  File "D:\02_Projects\03_Active\Evolution\Crawl-Eat-Die-Repeat-broken\population.py", line 42 in generate_individuals
  File "D:\02_Projects\03_Active\Evolution\Crawl-Eat-Die-Repeat-broken\main.py", line 113 in <module>

Line 42 of my population.py script is resetting a dictionary containing a 500 class instances with variables made up of pymunk objects.

OS: Windows 10 PY: Python 3.10.4

Day-Go avatar May 25 '22 16:05 Day-Go

While running the code today i got a similar error with a more detailed stack trace. Looks like the problem is related to multiprocessing.

Windows fatal exception: access violation

Thread 0x0000401c (most recent call first):
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 310 in _recv_bytes
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 255 in recv
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 576 in _handle_results
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 917 in run
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 980 in _bootstrap_inner
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 937 in _bootstrap

Thread 0x0000506c (most recent call first):
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 528 in _handle_tasks
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 917 in run
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 980 in _bootstrap_inner
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 937 in _bootstrap

Thread 0x00005464 (most recent call first):
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 816 in _exhaustive_wait
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\connection.py", line 884 in wait
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 499 in _wait_for_updates
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 519 in _handle_workers
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 917 in run
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 980 in _bootstrap_inner
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\threading.py", line 937 in _bootstrap

Current thread 0x000054a8 (most recent call first):
  File "C:\Users\JLong\AppData\Local\Programs\Python\Python39\lib\site-packages\pymunk\shapes.py", line 65 in shapefree
  File "D:\02_Projects\03_Active\Evolution\Crawl-Eat-Die-Repeat-broken\population.py", line 42 in generate_individuals
  File "D:\02_Projects\03_Active\Evolution\Crawl-Eat-Die-Repeat-broken\main.py", line 113 in <module>

Day-Go avatar May 26 '22 19:05 Day-Go

Do you call into Pymunk from multiple threads? Pymunk is not threadsafe, so if you call its methods from multiple threads it might break.

viblo avatar May 27 '22 06:05 viblo

Thanks for the reply and for developing this great library.

I'm not sure what you mean by call into Pymunk but I'll give you a brief step-by-step summary of what the code does.

  1. Initialize a multiprocessing pool and pass in a large dictionary containing class instances made up of Pymunk objects.
  2. Create a pymunk.Space() with each of the processes (there are 6 processes in total)
  3. Populate each space with objects from the dictionary
  4. Enter a loop which steps through the simulation with space.step(dt). Each process has its own independent space.
  5. Return integer from process

At no point do two threads interact with a single space. The processes are independent from each other.

The code is on GitHub - https://github.com/OolongJunSun/Crawl-Eat-Die-Repeat

Day-Go avatar May 27 '22 07:05 Day-Go

I tried your code and managed to reproduce the error. To check the threading theory I rewrote the pool to use a for loop instead, but it seems like it can still happen, but maybe less often. More investigation needed.

viblo avatar May 27 '22 20:05 viblo

Really appreciate you going to the effort.

I've found that reducing the number of pymunk objects causes less frequent crashes. When i first reported the problem i had self.n_genes in population.py set to 13. Now i have it set to 9 and the crashes are much less frequent (every 20-50 generations, sometimes 200 generations without crashing).

Other than that I still can't nail down anything that causes the crash to be more or less likely to happen.

Day-Go avatar May 27 '22 20:05 Day-Go

What does n_genes control? I mean, what happens to the physics objects?

viblo avatar May 27 '22 20:05 viblo

n_genes corresponds to the number limbs that the randomly generated creature has. By increasing n_genes by 1 you add an additional pymunk.Body(), pymunk.Segment() and between 1-3 pymunk.constraints(). A PivotJoint is always added and either a SimpleMotor and/or a DampedRotarySpring may be added.

It's not much but since we are generating 500 class instances simultaneously it can add up to 2500 more pymunk objects to the dictionary which is causing the error (line 42 of population.py).

I'm hesitant to attribute the size of the dictionary to the error since it can still occur when we process smaller dictionaries, albeit less frequently. The size of the dictionary can be controlled by changing the n_individuals variable in main.py.

Day-Go avatar May 27 '22 21:05 Day-Go

Ah, it became a bit easier to validate different theories around this error when I could trigger the error more often with a higher n_genes. It seems like the error reduces (or completely goes away) when I manually remove the shapes and reset their bodies to None in evaluate_individual, just after the while loop like this:

for shape in env.space.shapes:
    env.space.remove(shape)
    shape.body = None

viblo avatar May 27 '22 21:05 viblo

Fantastic, thank you so much for the help. I'll add your modification and see if the error happens anymore.

Will keep the issue open for another day but hopefully that fixes it.

Day-Go avatar May 27 '22 21:05 Day-Go

Error still occurring unfortunately.

Day-Go avatar May 27 '22 23:05 Day-Go

Following your example I've added some code to manually set all the pymunk.Body() and pymunk.Segment objects in the error causing dictionary to None.

        for organism in population.cohort.values():
            for limb in organism["instance"].body.structure.values():
                limb["obj"].matter = None
                limb["obj"].shape = None

Would you recommend also setting all of the constraint objects to None?

Day-Go avatar May 27 '22 23:05 Day-Go

Error hasn't occurred since manually resetting objects in the dictionary as mentioned above. Looks like the issue was caused by freeing a large number of pymunk shapes from memory simultaneously.

Day-Go avatar May 28 '22 14:05 Day-Go

Ok, great that you found a workaround! I will still keep this issue open for a while, while I research it a bit more.

viblo avatar May 28 '22 19:05 viblo