pmda
pmda copied to clipboard
Parallelize select_atoms by coordinates
Hello,
To simplify my problem, I want to select atoms based on coordinates through all frames. I tried to do this with pmda but it doesn't work with n_jobs > 1 and it gives results for n_jobs = 1 but atoms positions aren't the expected ones.
Expected behaviour
The expected behaviour is this one, provided by the serial implementation.
import MDAnalysis as mda
def get_atoms_below10(atomgroup):
return atomgroup.select_atoms(f"prop x >= 0 and prop x < 10")
u = mda.Universe(top, trj)
membrane_atoms = u.select_atoms("resname DOPC")
atom_groups_through_frames = [get_atoms_below10(membrane_atoms) for trj in u.trajectory[:100]]
print(atom_groups_through_frames[0][0].position)
print(atom_groups_through_frames[55][8].position)
Gives me this :
[ 4.1 172.29999 180.29999]
[ 3.9999998 86.6 195.8 ]
Actual behaviour
Here is the code I tested with pmda
python
import MDAnalysis as mda
from pmda.custom import AnalysisFromFunction
def get_atoms_below10(atomgroup):
return atomgroup.select_atoms(f"prop x >= 0 and prop x < 10")
parallel_groups = AnalysisFromFunction(get_atoms_below10, u, membrane_atoms).run(n_jobs = 4, start = 0, stop = 100)
atom_groups_through_frames = parallel_groups.results
print(atom_groups_through_frames[0][0].position)
print(atom_groups_through_frames[55][8].position)
I got the following error
RuntimeError: Couldn't find a suitable Universe to unpickle AtomGroup onto with Universe hash '1bea900d-48f2-417c-a727-e8801ff0611b'. Available hashes: 52f98645-a531-4513-a596-072ef9b8a50a
If I try with n_jobs = 1 :
parallel_groups = AnalysisFromFunction(get_atoms_below10, u, membrane_atoms).run(n_jobs = 1, start = 0, stop = 100)
atom_groups_through_frames = parallel_groups.results
print(atom_groups_through_frames[0][0].position)
print(atom_groups_through_frames[55][8].position)
I got results but not as expected :
[ 43.800003 170.8 223.4 ]
[ 7.6 216.6 207.2]
I noticed that if we change the number of blocks, results change too. I think all atoms at position i in the same block end with the same coordinates during _reduce step, or something like that.
Finally, I managed to do my parallel computation without pmda with python multiprocessing.Pool, by slicing the trajectories into n chunks, with one copy of the universe per chunk.
Currently version of MDAnalysis:
mda version : 1.0.0 pmda version : 0.3.0+17.g13fa3b5 dask version : 2.21.0