Starfish
Starfish copied to clipboard
Pickling in parallel.py
I'm not sure what the history of parallel.py
is, but I tried running it with some data files and I ran into the issues that nothing would work as is because of pickling.
I understand the problem is that, for instance, in the initialize
function we start a process with target model.brain
, but Python cannot pickle bound methods. Bound methods are any kind of method that belong to a class that aren't classmethods- in other words, any method that requires self
as one of the arguments.
I was easily able to edit the code from scripts/star.py
to avoid this issue, but I'm curious if/how this has worked before since this pickle problem has existed for as long as I've used python.
Hrm, that's strange. The reason why parallel.py
and star.py
looked the way they did was to avoid the pickling issue. I looked through the commit history and there was nothing that seemed obvious to cause it to stop working. It sounds like you have it working now, but if you post some more error messages maybe I can think of something else that might be causing it.
Granted, the confusing nature of parallel.py
was something I wasn't very happy about in the long term, since as we are finding out, it's pretty brittle code. The main reason we had it was to enable the nested Gibbs sampling with multiple echelle orders running simultaneously.
I will try a minimal example using a VM to see if I can recreate my error.
I cannot recreate on an Ubuntu VM (using Docker). I will investigate further.
Ok that's pretty strange. I've done all my developing and testing on Arch Linux, if that helps as a point of reference.
So on windows if I call
star.py --optimize=Theta
I get
TypeError: can't pickle _thread.RLock objects
.
.
.
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
when calling Process.start()
from within parallel.initialize()
When I take the exact same code by mounting it onto an Ubuntu VM, I have no problems.
E:
I've opened a question on StackOverflow since this seems like a platform issue. As we rewrite parallel.py
we will definitely have to do some discussion about multiprocessing. For our base use-case, though, we can defer the issue.