forcebalance
forcebalance copied to clipboard
[WIP] Revision to Thermo target
Progress and notes:
- Parser for new data format (100% done)
- Multiple files will be read into a single
DataFrame
. - The "system index" specifies an experimental data set and corresponding simulations (topology, initial conditions, simulation settings and thermodynamic ensemble).
- Multiple files will be read into a single
- Create
Observable
andSimulation
objects from user input (100% done)- A single system index may correspond to multiple simulations to be executed.
- For example, simulating the heat of vaporization would require gas and liquid simulations.
- Simulating the density would also require running the liquid simulation.
- Parallelize across system indices and independent initial conditions.
- Can also parallelize across multiple simulations within a system index if desired (currently performed as a chain).
- Some observables are not uniquely mapped to simulations (e.g. density can come from a liquid or a solid).
- Furthermore, the required simulations are not determined automatically from the user-specified observables, because the method for calculating the observable depends on the type of simulation (e.g. compressibility may be calculated for liquids, solids, and bilayers).
- Thus, the input file must specify both the observables to be calculated and the simulations to be run.
- Restriction: An error will be thrown if more than one simulation name is provided that can calculate a specified observable. Thus, if the density is specified as an observable, either the liquid or solid simulation must be specified but not both.
- How to make this more flexible in the future? Perhaps the column heading can contain the system name such as
solid_density
orliquid_density
- In order to calculate some timeseries (e.g. deuterium order parameter), the
Observable
class needs to pass some information to theSimulation
. Need to figure out how to do this right.
- Specify all simulation options in input file parser (50% done)
- Default settings may apply to all simulations (e.g.
eq_steps
,md_steps
,timestep
). - If initial conditions are specified in the input file, it should override the default search path for initial coordinate files.
- Default settings may apply to all simulations (e.g.
- Time series class; Split
get_timeseries()
frommolecular_dynamics()
(60% done)- Represents a time series of instantaneous observables; possibly subclass
DataFrame
. - OpenMM saves observables to memory as the simulation is run, so the names of needed timeseries must be saved as Engine attributes.
- On the other hand, GROMACS generates all observables in a post-processing step, so the names of needed timeseries don't need to be stored.
- New observables may require new timeseries to be implemented here.
- Certain timeseries may only be available for some engines (e.g. quantum kinetic energy estimator from OpenMM).
- Represents a time series of instantaneous observables; possibly subclass
- Run simulations and save time series to disk. This can be done using
md_chain.py
(i.e. a chain of simulations for a particular index), ormd_one.py
(i.e. independent simulations) (50% done)- Replacement for
npt.py
andnpt_lipid.py
- Should
md_chain.py
andmd_one.py
use the same file and directory structure? Need to make sure output frommd_one.py
is properly named - or put results frommd_one.py
into different folders. - Energy / dipole derivatives are calculated here, also as a time series.
- Replacement for
- Apply MBAR estimator for grouped system indices
- Applying MBAR estimator across system indices with different molecules makes no sense.
- Calculate observables from time series. (25% done)
- Store a dictionary of time series, keyed by the system index and the simulation name.
- Formulas for calculating observables and their derivatives from time series are implemented here.
- Observables may require time series from multiple simulations (e.g. heat of vaporization).
- Observables will still be calculated if experimental data is missing (because it's nice to have a full table of predicted values), but they won't go into the objective function.
- If experimental data is very sparse then we shouldn't put them in the same
Target
anyway.
- Multiple independent initial conditions (50% done)
- How to organize? I propose
targets/target_name/system_index/simulation_name_#.[gro|pdb|xyz]
numbered from 1. Multiple files are best because PDB format often doesn't update the periodic box across different structures. - If only one initial condition, then
_#
not needed.
- How to organize? I propose
- The remote scripts
md_one.py
andmd_chain.py
should have ways to calculate all observables that they are able to calculate (as an additional way to check consistency) - Map abbreviated units to full units
- XML format parser
- Added unit tests
- Read multiple ways of specifying lipid data and check that the data tables are the same.
Hi Lee-Ping,
This looks very promising! I will be travelling for a week, but I will take a close look when I am back at work.
Best, Erik