pmda
pmda copied to clipboard
Error when running `AnalysisFromFunction()` on more processes than frames
Expected behaviour
Successfully running AnalysisFromFunction()
on all available CPUs by setting n_jobs=-1
even for very small trajectories.
Actual behaviour
A Warning is raised:
/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/pmda/parallel.py:360: UserWarning: run() uses more blocks than frames: decrease n_blocks
warnings.warn("run() uses more blocks than frames: "
/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order)
but the code runs anyway until an error is thrown:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<omissis>
/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/pmda/parallel.py in run(self, start, stop, step, n_jobs, n_blocks)
398 # save the frame numbers for all blocks
399 self._blocks = _blocks
--> 400 self._conclude()
401 # put all time information into the timing object
402 self.timing = Timing(
/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/pmda/custom.py in _conclude(self)
101
102 def _conclude(self):
--> 103 self.results = np.concatenate(self._results)
104
105
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 10 has 1 dimension(s)
Code to reproduce the behaviour
I could not find MDs to run an example on (I had problems installing MDAnalysisTests
, see issue #3084) but it basically happens when AnalysisFromFunction()
is run on a trajectory with n
frames and n_jobs
is set to a value greater than n
, or n_jobs = -1
This is not a big deal, but it was hard to debug and I wanted to report it.
Currently version of MDAnalysis: 1.0.0
pmda version: 0.3.0
A quick fix would be to add the following check:
# import trajectory
u = mda.Universe(pdb_file, traj_file)
# set number of parallel processes
if n_jobs == -1:
n_jobs = len(os.sched_getaffinity(0))
# make sure that n_jobs is not greater than the actual number of frames
n_total_frames = len(u.trajectory)
n_actual_frames = len(range(
start if start else 0,
min(n_total_frames, stop) if stop else n_total_frames,
step if step else 1))
n_jobs = min(n_jobs, n_actual_frames)
Thank you.
@luponzo86 you could create a pull request with your check. We would review, guide you in adding tests, and you'd become an author of PMDA.
Development on PMDA is currently pretty slow because everybody is doing many other things (and in particular, there's a lot of work on MDAnalysis itself). Any help is greatly appreciated.