openmmtools
openmmtools copied to clipboard
MultiStateReporter variable pos/vel save frequency
Description
allows MultiStateReporter to write positions and velocities and different frequency to energies data.
Todos
- [x] Implement feature / fix bug
- [x] Add tests
- [x] Update documentation as needed
- [x] Update changelog to summarize changes in behavior, enhancements, and bugfixes implemented in this PR
Status
- [ ] Ready to go
Changelog message
MultiStateReporter now takes optional `position_interval` and `velocity_interval` keyword args to control the frequency these are saved at
Will we run into key errors if you try and use this with an .nc
file from an older version?
Codecov Report
Merging #712 (d8f80b9) into main (cbff4c8) will increase coverage by
0.00%
. The diff coverage is92.85%
.
:exclamation: Current head d8f80b9 differs from pull request most recent head 981fe1b. Consider uploading reports for the commit 981fe1b to get more accurate results
Additional details and impacted files
@mikemhenry it shouldn't affect reading old files, I've only played with the writing code.
In terms of old code reading newer nc files, unless you've changed the defaults from writing positions/velocities on every energy write you should get the same. If you read files with missing data you now get:
>>> ds.variables['positions'][1, :, :]
masked_array(
data=[[--, --, --],
[--, --, --],
[--, --, --],
[--, --, --],
[--, --, --],
[--, --, --],
[--, --, --],
so it looks like netcdf isn't erroring but instead giving you the lack of data.
I have added two attributes to the DataSet, so if you write code that expects these you're going to have a bad time.
Is this what ncfile.ConventionVersion = '0.2'
is made to handle? Should I be bumping that?
@ijpulidos the filesizes for MultiStateReporter can get pretty large, since you're saving ~11 trajectories. You typically want to save the energies at a high frequency (since you're very interesting in seeing correlation times) but positions you can have these at a lower frequency, velocities you might not even care about.
Can you point me towards the existing tests so I can make sure the tests I write fit in?
I see, yeah I had the feeling it was a storage issue. We probably need to clean things up in general but I think this will help. Thanks for clarifying the motivation.
As per the tests, I think having them as part of the TestReporter
class should be the place, we might consider moving this class to its own test module, but we don't need that right now. Thanks!
In addition to adding tests, we also want to add
- energy frequency
And double check what gets saved when
We want to figure out what should be saved in the checkpoint and what should be saved in simulation.nc
@mikemhenry Yes, I agree we probably need to discuss first what we want to store and when. And which of these are needed when resuming simulations and similar situations.
If we want to have all of these independently we probably want to sync them with the checkpointing. Such that the required values are up to date when resuming or extending simulations.
Yah I think we need to also make sure we figure out what our analysis tools expect. My hunch is that we need to save a lot of stuff to resume a simulation, but that should be a checkpoint, the time series data we save for analysis should be user configurable.
the time series data we save for analysis should be user configurable.
Yes, decoupling the checkpointing with the user-configurable time series data would probably work better here. I'd vote for that route if it makes sense.
As far as I can tell these are coupled. We would need to think a little bit further about this, since I'm not sure how big of a refactor this could be. There's a lot of moving back and forth between the MultiStateReporter
and the MultiStateSampler
.
@ijpulidos @mikemhenry this is much closer to ready, I've added some tests so might be a good time for you both to read through
@richardjgowers getting some errors in CI
Traceback (most recent call last):
File "/home/runner/micromamba-root/envs/openmmtools-test/lib/python3.9/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/runner/work/openmmtools/openmmtools/openmmtools/tests/test_sampling.py", line 523, in test_write_sampler_states_no_pos
assert restored_state.positions._value.mask.all()
AttributeError: 'numpy.ndarray' object has no attribute 'mask'
See the CI log here: https://github.com/choderalab/openmmtools/actions/runs/8052365133
Honestly if you make a commit to bump this PR, you should get the same results, I was having an issue re-starting CI
I wonder if this is what's biting us here, though I don't know why are we expecting these arrays to be masked? https://github.com/choderalab/openmmtools/pull/701
Now, to be fair, I don't think the changes in that linked PR have anything to do with the serialized/deserialized positions or velocities. So I doubt it.