robot_dart icon indicating copy to clipboard operation
robot_dart copied to clipboard

Problems and possible patches with `example_parallel.py`

Open c-joly opened this issue 1 year ago • 0 comments

Hi @costashatz , here is an issue linked to the JOSS review https://github.com/openjournals/joss-reviews/issues/6771.

I experienced several problems when running the example example_parallel.py on my MacOS version, listed here with some possible patches (I tried python 3.10 and 3.12, all these problems occur).

Problem with pickle

When running the example, the first issue was the following error when running the instruction runInParallel(N): TypeError: cannot pickle 'PyCapsule' object

It looks like pickle cannot serialize rd.gui.run_with_gl_context (I read it may try to serialize the whole module and not only the function). Inspired by https://stackoverflow.com/questions/72766345/attributeerror-cant-pickle-local-object-in-multiprocessing, I created a global function work that makes the call to rd.gui.run_with_gl_context:

def work(x,y):
   rd.gui.run_with_gl_context(x,y)

and then, I defined the process by:

p = Process(target=work, args=(test, 20))

This solves the pickle's problem, but a new one appeared...

Error linked to main

After solving the picke issue, I received the following error message:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

It looks like the interpeter does not like that the multiprocess it started directly in the lines that are executed when we import the module. As suggested in the error message, I made the classical test if __name__ == '__main__' before the lines that have to be exectued. So, the won't be exectued if the module is just imported. This leads to these changes at the end of the file:

if __name__=='__main__':
  print('Running parallel evaluations')
  N = 15
  start = timer()
  runInParallel(N)
  end = timer()
  print('Time:', end-start)

Windowsless GLContext unsupported in Mac!

I then receive a runtime error at the call of rd.gui.run_with_gl_context

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/homebrew/Cellar/[email protected]/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/cyjoly/ownCloud/Recherche/Reviews/JOSS2024/tests/python/example_parallel.py", line 54, in work
    rd.gui.run_with_gl_context(x,y)
RuntimeError: robot_dart assertion failed: 'Windowless GLContext unsupported in Mac!'

This error appears directly when calling rd.gui.run_with_gl_context. I tried to patch the function test to force a Windowed version:

configuration = rd.gui.GraphicsConfiguration()
configuration.shadowed = False # To prevent the segfault issue with shadows
graphics = rd.gui.Graphics(configuration)
#graphics = rd.gui.WindowlessGraphics()

But it did not solve the problem. Actually, I figured out the assert is made before the call of test (I added a print at the first line of the function test and it never appeared)

I finnally succeeded to make it work by passing directly the test function in the Process defintion:

p = Process(target=test) # No args anymore in that case

This works and I have eventually many windows coming and many png images generated. So, I guess it is the expected result. At this point, I have the following questions:

  • The pickle problems seems not present on Linux (no problem on my Virtual Machine, but others come due to the poor virtualization of openGL, so I did not dig more). @bstanciulescu, can you confirm the linux behavior? As I am not an expert of the multiprocessing library, I cannot explain this OS-dependent behavior of the serialization process.
  • Is my last patch really acceptable to do multiprocessing with robot_dart? What is the consequence of bypassing the call to rd.gui.run_with_gl_context? Actually, I did not find in the documentation the purpose of this function, so I am not sure if I am doing well.
  • multiprocessing in windowed mode is not very efficient (the speed of the simulator looks bounded to 1x). Actually, I figured out that it works in windowless mode! (see next section). Thus, the assert that makes the architecture test might be removed...
  • In the test function, you define ii = pid%15. It looks like you make the hypothesis that the 15 processes are going to have different numbers after taking the modulo to 15. It should work if all the PID are consecutive numbers. Since the ID of a process is decided by the OS, is it a guaranteed behavior? (I checked on my computer with ps aux |grep python, and the PID were actually consecutive numbers).

Windowless GLContext finally works!

I finally tried to switch back to the windowless mode, and it actually works well! At the end, I have 15 images with different points of view, which makes a "continuous" rotation of the camera if we display them in the good order. I think it is what we would like to have with this example. Surpsingly, shadows work very well in that case!

However, in windowed mode, I still need to deactivate shadows to prevent the segfault issue (https://github.com/NOSALRO/robot_dart/issues/204). But, I noticed a much worse problems: the image generated are corrupted and cannot be read! (I will open a separate issue for that).

c-joly avatar Jul 02 '24 14:07 c-joly