mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Support for MDAnalysis packages dealing with non-linear time dump

Open gitsirsha opened this issue 7 months ago • 6 comments

Is your feature request related to a problem?

  1. The MDAnalysis.analysis.msd module (and similar analysis tools in the MDAnalysis ecosystem) assume that trajectory files have frames spaced at regular, linear time intervals.

  2. Many users using engines like LAMMPS prefer to often output trajectory data at exponentially increasing time intervals (e.g., 2⁰, 2¹, ..., 2²⁰) rather than at fixed, evenly spaced intervals. This helps to capture both early time dynamics and long-time behaviour without increasing the lammps-output file (.lammpstrj) size

  3. Implementing this will be also useful for other analysis modules such as ACF (autocorrelation function) and RDF (radial distribution function).

  4. Integral to an ongoing polymer simulations collaboration.

Describe the solution you'd like

  1. For example, going back to the MSD package it is averaged over all possible lag-times τ ≤ τ_max as mentioned in the manual, it works very well for the files that is linearly dumped!
  • This approach fails for trajectories with exponentially spaced frames, where the time difference between successive frames is not constant.
  1. Instead, it would be ideal to compute MSD by only averaging particles over each frame and not doing the later time averaging part for an exponentially dumped file.

We are happy to provide a draft solution or collaborate on the implementation if needed!

Describe alternatives you've considered and Additional context

  1. Rerunning LAMMPS simulations with linear timestep outputs, but this results in unnecessarily large output files and added computational effort.

  2. The units of the output msd is ambiguous for LJ systems. Check related issue #5009

gitsirsha avatar Apr 16 '25 16:04 gitsirsha

@orbeckst I wanted to follow up on this. We were working towards a polymer focused MD-toolkit, but this is the sort of thing that would make more sense being contributed back to the main code repository. There is a draft code that we can submit as an PR if this is of interest.

mrshirts avatar Jun 11 '25 17:06 mrshirts

@mrshirts could you point us to the code in question? It would make it easier for us to know what kind of changes had to be made to enable this. The likely blocker here is just making sure that it is suitably generalizable to all users. If it's easier please do feel free to open a PR.

IAlibay avatar Jun 11 '25 18:06 IAlibay

My first question is if the dump files record timestamps, i.e., when you do [ts.time for ts in u.trajectory], do you recover the actual times that reflect your exponential spacing?

  • If the answer is yes then your question seems to be primarily about upgrading specific analysis tools. This could be done on a tool-by-tool basis. If you have a generalizable idea (such as "give me the option to store by-frame data for manual postprocessing") then we could also look into adding it to AnalysisBase and then all tools will inherit the capability.
  • If no then the first order of business is figuring out how to provide the time information. One way that this would work is to create a new transformation that generates the appropriate time, or to provide an auxiliary file, or to add an option to the LAMMPS trajectory reader.

For the units issue, let's continue discussion at #5009. We really like to have more input (and contributions) from LAMMPS users!

orbeckst avatar Jun 11 '25 18:06 orbeckst

@orbeckst

  • Doing [ts.time for ts in u.trajectory] provides the timesteps as expected. The issue we observed is the default MDAnalysis.analysis.msd calculates the time-averaged MSDs as if the time were linearly spaced.

  • I am working on rewriting this addition as a PR that could potentially be included in the main MDAnalysis distribution by providing the current MSD with a keyword for dumps with different timespacings.

gitsirsha avatar Jun 11 '25 22:06 gitsirsha

@gitsirsha a PR is a great starting point for a discussion if you've already done a good part of the work. The PR does not have to be complete for a discussion, we just want to see what the major changes are and what changes for the user. It's also really important to have a description of the underlying ideas/methods/algorithms.

orbeckst avatar Jun 11 '25 22:06 orbeckst

@orbeckst , @gitsirsha is working on making what he has fit in better with MDanalysis, we'll update over the next few days!

mrshirts avatar Jun 13 '25 17:06 mrshirts