lumol icon indicating copy to clipboard operation
lumol copied to clipboard

Add restart files

Open Luthaf opened this issue 9 years ago • 5 comments

If the simulation crashes, or to restart a simulation for more steps, a restart file is needed. It should be some private binary format, but with compatibility between multiple versions of the code. So a binary dump of the memory is not an option.

Using Chemfiles with a file format containing all the data (NetCDF ? TNG ? Another one ?) could be nice, but chemfiles only provides floats values for now (see chemfiles/chemfiles#34), and double would be better here.

Luthaf avatar Jun 30 '16 22:06 Luthaf

One solution would be to use HDF5 to store it. The latest version of NetCDF is based on HDF5 but with some limitations.

https://support.hdfgroup.org/HDF5/whatishdf5.html

mgxm avatar Oct 03 '17 13:10 mgxm

Yeah, HDF5 could be nice too here! There is a standard format for using HDF5 storage for MD data called h5md, we could implement that too.

The latest version of NetCDF is based on HDF5 but with some limitations.

I was referring to the Amber NetCDF convention (http://ambermd.org/netcdf/nctraj.xhtml), which specifically uses the NetCDF 3 encoding, and not HDF5.

Luthaf avatar Oct 03 '17 14:10 Luthaf

Thank you! I didn't know about the H5MD, I will take a look at the specifications. 😄

mgxm avatar Oct 03 '17 15:10 mgxm

I like how H5MD specification handle the datas. But for now, we don't have a complete crate to handle HDF5 files, there's one[1], but its lack some important features, seems that the author is implementing these features in others branchs[2][3][4]. In another way, there's a crate for NetCDF[5] but they lack some features too.

  Not (yet) supported:

  appending to existing files (using unlimited dimensions),
  user defined types, string variables, multi-valued
  attributes,

  All variable data is read into a 1-dimensional Vec with the
  last variable dimension varying fastest, or as a ndarray.

What do you think is the best approach to do that? We'll need to take some efforts on both crates to make it usable.


[1] HDF5 for Rust

[2] feature/types

[3] feature/types-WIP

[4] feature/typesystem

[5] High-level NetCDF bindings for Rust

mgxm avatar Oct 06 '17 16:10 mgxm

What do you think is the best approach to do that?

For NetCDF, chemfiles already supports the trajectory convention. Adding support for the Restart convention should not be too hard. There are rust bindings to the library, which are already used in lumol for input and trajectory output. I think it would make sense to implement the Restart convention in chemfiles and then use that.

The main issue with this approach is that we don't have any simple way to save the interactions parameter in the restart file. We could dump the whole potential table from the initial input into a string and parse it again, but this would mean loosing any pre-computed value, for example in TableComputation, or in Ewald, or in any energy cache. I think that this is OK for a first version, we can only save restart state for the Configuration object, and forget about the rest of the system.


Another way of doing this would be to use Serde to serialize and save the whole system state. I don't know how well it could work, especially with respect to changes in the structs: what happens if a new field is added to a struct? How do we manage to still load the information in memory? This would be more flexible, but also way harder to design properly.

Luthaf avatar Oct 06 '17 18:10 Luthaf