yank icon indicating copy to clipboard operation
yank copied to clipboard

T4 lysozyme example with implicit solvent runs out of memory when lots of memory appears to be available

Open therealchrisneale opened this issue 2 years ago • 0 comments

8 processes works OK: While running mpiexec.hydra -np 8 yank script --yaml=p-xylene-implicit.yaml: bash-4.2$ free total used free shared buff/cache available Mem: 131934588 5161320 114950568 1014712 11822700 124444344 Swap: 0 0 0

20 processes gives an error: While running with mpiexec.hydra -np 20 yank script --yaml=p-xylene-implicit.yaml, just before failure: bash-4.2$ free total used free shared buff/cache available Mem: 131934588 6578724 113531156 1019564 11824708 123022088 Swap: 0 0 0

The first error message and surrounding text were: <…snip…> 2022-05-20 13:23:05,043: WARNING - openmmtools.multistate.multistatesampler - Warning: The openmmtools.multistate API is experimental and may change in future releases Traceback (most recent call last): File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/yank/schema/validator.py", line 411, in call_constructor obj = subcls(**constructor_kwargs) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/openmmtools/multistate/replicaexchange.py", line 217, in init super(ReplicaExchangeSampler, self).init(**kwargs) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/openmmtools/multistate/multistatesampler.py", line 203, in init self._display_cuda_devices() File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/openmmtools/multistate/multistatesampler.py", line 1772, in _display_cuda_devices cuda_query_output = os.popen("nvidia-smi --query-gpu=index,gpu_name --format=csv,noheader").read().strip() File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/os.py", line 980, in popen bufsize=buffering) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/subprocess.py", line 729, in init restore_signals, start_new_session) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/subprocess.py", line 1295, in _execute_child restore_signals, start_new_session, preexec_fn) OSError: [Errno 12] Cannot allocate memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/bin/yank", line 10, in sys.exit(main()) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/yank/cli.py", line 73, in main dispatched = getattr(commands, command).dispatch(command_args) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/yank/commands/script.py", line 155, in dispatch yaml_builder.run_experiments(write_status=write_status) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/yank/experiment.py", line 747, in run_experiments group_size = self._get_experiment_mpi_group_size(all_experiments) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/yank/experiment.py", line 2862, in _get_experiment_mpi_group_size sampler_names = {self._create_experiment_sampler(exp[1], []).class.name for exp in experiments} File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/yank/experiment.py", line 2862, in sampler_names = {self._create_experiment_sampler(exp[1], []).class.name for exp in experiments} File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/yank/experiment.py", line 2990, in _create_experiment_sampler return schema.call_sampler_constructor(constructor_description) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/yank/schema/validator.py", line 470, in call_sampler_constructor special_conversions=special_conversions) File "/usr/projects/mrmdesign/MCMD/CONDA_ENVS/yank-badger/lib/python3.6/site-packages/yank/schema/validator.py", line 413, in call_constructor raise RuntimeError('Attempt to initialize failed with: {}'.format(str(e))) RuntimeError: Attempt to initialize failed with: [Errno 12] Cannot allocate memory 2022-05-20 13:23:05,054: CRITICAL - mpiplus.mpiplus - MPI node 1/20 raised an exception and called Abort()! The exception traceback follows <…snip…>

For what it's worth, I get an entirely different error with -np 25 (so perhaps I am just running things incorrectly since I count 25 lambda values for the complex system):

<...snip...> Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 6939 RUNNING AT ba173 = EXIT CODE: 11 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions

therealchrisneale avatar May 20 '22 19:05 therealchrisneale