forcebalance [WIP] Revision to Thermo target

[WIP] Revision to Thermo target

Open leeping opened this issue 10 years ago • 1 comments

Progress and notes:

Parser for new data format (100% done)
- Multiple files will be read into a single DataFrame.
- The "system index" specifies an experimental data set and corresponding simulations (topology, initial conditions, simulation settings and thermodynamic ensemble).
Create Observable and Simulation objects from user input (100% done)
- A single system index may correspond to multiple simulations to be executed.
- For example, simulating the heat of vaporization would require gas and liquid simulations.
- Simulating the density would also require running the liquid simulation.
- Parallelize across system indices and independent initial conditions.
- Can also parallelize across multiple simulations within a system index if desired (currently performed as a chain).
- Some observables are not uniquely mapped to simulations (e.g. density can come from a liquid or a solid).
- Furthermore, the required simulations are not determined automatically from the user-specified observables, because the method for calculating the observable depends on the type of simulation (e.g. compressibility may be calculated for liquids, solids, and bilayers).
- Thus, the input file must specify both the observables to be calculated and the simulations to be run.
- Restriction: An error will be thrown if more than one simulation name is provided that can calculate a specified observable. Thus, if the density is specified as an observable, either the liquid or solid simulation must be specified but not both.
- How to make this more flexible in the future? Perhaps the column heading can contain the system name such as solid_density or liquid_density
- In order to calculate some timeseries (e.g. deuterium order parameter), the Observable class needs to pass some information to the Simulation. Need to figure out how to do this right.
Specify all simulation options in input file parser (50% done)
- Default settings may apply to all simulations (e.g. eq_steps, md_steps, timestep).
- If initial conditions are specified in the input file, it should override the default search path for initial coordinate files.
Time series class; Split get_timeseries() from molecular_dynamics() (60% done)
- Represents a time series of instantaneous observables; possibly subclass DataFrame.
- OpenMM saves observables to memory as the simulation is run, so the names of needed timeseries must be saved as Engine attributes.
- On the other hand, GROMACS generates all observables in a post-processing step, so the names of needed timeseries don't need to be stored.
- New observables may require new timeseries to be implemented here.
- Certain timeseries may only be available for some engines (e.g. quantum kinetic energy estimator from OpenMM).
Run simulations and save time series to disk. This can be done using md_chain.py (i.e. a chain of simulations for a particular index), or md_one.py (i.e. independent simulations) (50% done)
- Replacement for npt.py and npt_lipid.py
- Should md_chain.py and md_one.py use the same file and directory structure? Need to make sure output from md_one.py is properly named - or put results from md_one.py into different folders.
- Energy / dipole derivatives are calculated here, also as a time series.
Apply MBAR estimator for grouped system indices
- Applying MBAR estimator across system indices with different molecules makes no sense.
Calculate observables from time series. (25% done)
- Store a dictionary of time series, keyed by the system index and the simulation name.
- Formulas for calculating observables and their derivatives from time series are implemented here.
- Observables may require time series from multiple simulations (e.g. heat of vaporization).
- Observables will still be calculated if experimental data is missing (because it's nice to have a full table of predicted values), but they won't go into the objective function.
- If experimental data is very sparse then we shouldn't put them in the same Target anyway.
Multiple independent initial conditions (50% done)
- How to organize? I propose targets/target_name/system_index/simulation_name_#.[gro|pdb|xyz] numbered from 1. Multiple files are best because PDB format often doesn't update the periodic box across different structures.
- If only one initial condition, then _# not needed.
The remote scripts md_one.py and md_chain.py should have ways to calculate all observables that they are able to calculate (as an additional way to check consistency)
Map abbreviated units to full units
XML format parser
Added unit tests
- Read multiple ways of specifying lipid data and check that the data tables are the same.

Apr 04 '14 08:04 leeping

Hi Lee-Ping,

This looks very promising! I will be travelling for a week, but I will take a close look when I am back at work.

Best, Erik

Apr 05 '14 11:04 ebran

forcebalance forcebalance copied to clipboard

[WIP] Revision to Thermo target

forcebalance
forcebalance copied to clipboard