openmmtools icon indicating copy to clipboard operation
openmmtools copied to clipboard

Deserializing nc file generated with openmmtools < 0.21.5 yields key error when loading with openmmtools 0.21.5 because of missing constraint tolerance

Open zhang-ivy opened this issue 1 year ago • 4 comments

from openmmtools.multistate import MultiStateReporter

main_dir = 45
replicate = 0
sub_dir = 3
phase = 'complex'
path = f"/data/chodera/zhangi/perses_benchmark/repex/perses-bnbs-paper-first-attempt/{main_dir}/{sub_dir}/replicate_{replicate}/perses-paper2/{sub_dir}/{sub_dir}_{phase}.nc"
reporter = MultiStateReporter(path, open_mode='r')
reporter.read_mcmc_moves()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 reporter.read_mcmc_moves()

File ~/miniconda3/envs/perses-paper2/lib/python3.9/site-packages/openmmtools-0.21.5-py3.9.egg/openmmtools/multistate/multistatereporter.py:773, in MultiStateReporter.read_mcmc_moves(self)
    771 for move_id in range(n_moves):
    772     serialized_move = self.read_dict('mcmc_moves/move{}'.format(move_id))
--> 773     mcmc_moves.append(deserialize(serialized_move))
    774 return mcmc_moves

File ~/miniconda3/envs/perses-paper2/lib/python3.9/site-packages/openmmtools-0.21.5-py3.9.egg/openmmtools/utils.py:678, in deserialize(serialization)
    676 instance = cls.__new__(cls)
    677 try:
--> 678     instance.__setstate__(serialization)
    679 except AttributeError:
    680     raise ValueError('Cannot deserialize class {} without a __setstate__ method'.format(class_name))

File ~/miniconda3/envs/perses-paper2/lib/python3.9/site-packages/openmmtools-0.21.5-py3.9.egg/openmmtools/mcmc.py:1160, in LangevinDynamicsMove.__setstate__(self, serialization)
   1158 self.timestep = serialization['timestep']
   1159 self.collision_rate = serialization['collision_rate']
-> 1160 self.constraint_tolerance = serialization['constraint_tolerance']

KeyError: 'constraint_tolerance'

Note that Mike was worried about this in this PR: https://github.com/choderalab/openmmtools/pull/611

zhang-ivy avatar Aug 03 '22 21:08 zhang-ivy

Good catch! This happens when trying to directly read the mcmc moves using the public API point MultiStateReporter.read_mcmc_moves. Just to realize that when resuming the simulations the moves are instantiated through the sampler from_storage method. As per https://github.com/choderalab/openmmtools/pull/611#issuecomment-1201979984

My guess is that the impact of this is not big, but this is a bug, since we care for backward compatibility, and also since read_mcmc_moves is public and should be expected to be directly run by the users. I'll be adding an example nc file that can be used to replicate the issue.

As a possible workaround, one could use MultiStateSampler.from_storage(reporter) and extract the mcmc_moves from the sampler.

ijpulidos avatar Aug 03 '22 21:08 ijpulidos

To reproduce the problem you can use the nc files attached here and run the following script using them

#!/bin/env python

# Directly read the mcmc_moves from reporter

from openmmtools.multistate import MultiStateReporter

reporter_path = "alanine_dipeptide_legacy.nc"
reporter = MultiStateReporter(reporter_path, checkpoint_interval=1, open_mode='r')

mcmc_moves = reporter.read_mcmc_moves()

alanine_dipeptide_legacy.tar.gz

ijpulidos avatar Aug 03 '22 22:08 ijpulidos

I thought we had this tested?

mikemhenry avatar Aug 03 '22 23:08 mikemhenry

I thought we had this tested?

So, it was tested by resuming simulation, which use the MultiStateSampler.from_storage method, but we didn't really test using the reporter.read_mcmc_moves public API point directly (my bad!).

ijpulidos avatar Aug 09 '22 16:08 ijpulidos