pmda icon indicating copy to clipboard operation
pmda copied to clipboard

Parallelize select_atoms by coordinates

Open cecilpert opened this issue 3 years ago • 7 comments

Hello,

To simplify my problem, I want to select atoms based on coordinates through all frames. I tried to do this with pmda but it doesn't work with n_jobs > 1 and it gives results for n_jobs = 1 but atoms positions aren't the expected ones.

Expected behaviour

The expected behaviour is this one, provided by the serial implementation.

import MDAnalysis as mda

def get_atoms_below10(atomgroup):
    return atomgroup.select_atoms(f"prop x >= 0 and prop x < 10")

u = mda.Universe(top, trj)
membrane_atoms = u.select_atoms("resname DOPC") 
atom_groups_through_frames = [get_atoms_below10(membrane_atoms) for trj in u.trajectory[:100]]
print(atom_groups_through_frames[0][0].position)
print(atom_groups_through_frames[55][8].position)

Gives me this :

[  4.1     172.29999 180.29999]
[  3.9999998  86.6       195.8      ]

Actual behaviour

Here is the code I tested with pmda

python
import MDAnalysis as mda
from pmda.custom import AnalysisFromFunction

def get_atoms_below10(atomgroup):
    return atomgroup.select_atoms(f"prop x >= 0 and prop x < 10")

parallel_groups = AnalysisFromFunction(get_atoms_below10, u, membrane_atoms).run(n_jobs = 4, start = 0, stop = 100)
atom_groups_through_frames = parallel_groups.results
print(atom_groups_through_frames[0][0].position)
print(atom_groups_through_frames[55][8].position)

I got the following error

RuntimeError: Couldn't find a suitable Universe to unpickle AtomGroup onto with Universe hash '1bea900d-48f2-417c-a727-e8801ff0611b'.  Available hashes: 52f98645-a531-4513-a596-072ef9b8a50a

If I try with n_jobs = 1 :

parallel_groups = AnalysisFromFunction(get_atoms_below10, u, membrane_atoms).run(n_jobs = 1, start = 0, stop = 100)
atom_groups_through_frames = parallel_groups.results
print(atom_groups_through_frames[0][0].position)
print(atom_groups_through_frames[55][8].position)

I got results but not as expected :

[ 43.800003 170.8      223.4     ]
[  7.6 216.6 207.2]

I noticed that if we change the number of blocks, results change too. I think all atoms at position i in the same block end with the same coordinates during _reduce step, or something like that.

Finally, I managed to do my parallel computation without pmda with python multiprocessing.Pool, by slicing the trajectories into n chunks, with one copy of the universe per chunk.

Currently version of MDAnalysis:

mda version : 1.0.0 pmda version : 0.3.0+17.g13fa3b5 dask version : 2.21.0

cecilpert avatar Aug 04 '20 15:08 cecilpert