Confusing inputs for the time between steps in the file
kinisi uses the same terminology as pymatgen to define the time between the steps in an input.
That is a time_step (the simulation time step for the integrator) and the step_skip (the output frequency to the file).
This is confusing and a stumbling block when new to kinisi.
Therefore, I propose that the meaning of time_step is changed to the time step between trajectory steps in the file and that step_skip is removed entirely.
I welcome any thoughts on this.
Yeah, this can be a source of confusion, and the change is a good move. Still, I would introduce a new terminology perhaps time_gap or similar to prevent confusion with people working from notebooks given to them by others as examples without anyone being aware of the change!
Yeah, this can be a source of confusion, and the change is a good move. Still, I would introduce a new terminology perhaps
time_gapor similar to prevent confusion with people working from notebooks given to them by others as examples without anyone being aware of the change!
I agree. I think it is best to only take one value from the user, that being the time between frames in the trajectory. This may be future proofing for introducing something like pint.
How will this play with subsampling frames?
Still, I would introduce a new terminology perhaps time_gap or similar to prevent confusion
My preference would be to keep the name time_step but if the user provides a step_skip then an error will be raised. Hopefully, removing the confusion.
How will this play with subsampling frames?
I think this would not effect the operation of the subsampling frames.
I think the error plan would help a lot, but I worry that time_step will have a particular MD-based meaning for the user base, and may cause confusion. I could see less experienced scientists getting tangled up with this and not realising it (some trajectory formats include a timestep number and they could assume this is read somehow). There are edge cases with aimd where people could end up publishing incorrect MSDs and diffusion coefficients because they're not so far off as to be preposterous and so the error doesn't get caught.
It wouldn't be Kinisi's fault in this circumstance but I think we owe it to the broader community to guard against such errors.
time_step and step_skip are good names IMO. They both mean what they say. And they have the advantage of being the same as in pymatgen.
And they have the advantage of being the same as in pymatgen.
Not everyone is using pymatgen, and hopefully eventually everyone is using kinisi before using pymatgen.
I'm not sure having the same names as another package is advantageous. It only helps a subset of the user base (those familiar with pymatgen), and if pymatgen changes the names in the future this will be moot. Further, I agree that time_step could be confused for the MD integration timestep. timestep is also used by some packages (pymatgen included) to refer to the current timestep, this could cause further confusion.
I am coming round to time_step not being a good choice. I don’t really like time_gap, any other ideas?
time_spacing
temporal_spacing
stride_length (inspired by i-pi)
Potentially step_write_frequency? It's a little long winded, but i think it gets the point across.
dt
frame_step
frame_time
frame_time_step
frame_gap
frame_step is my favourite so far. Followed by stride_length and stride_step.
frame_interval
I am not a fan of the *_interval options. This is because I often talk about $\Delta t$ as the "time interval", and in the past, some people have been confused and think that the ballistic regime is times before the simulation has equilibrated (this is not the case). So, I want to keep simulation time and MSD time descriptors separate.
My issue with frame_step is that, to me, it reads as a number of steps (basically step_skip) rather than a timeframe. Maybe @jd15489's suggestion of frame_time would make more sense?
- timestep meaning time per integration step is standard MD terminology.
- step skip (number of steps skipped per output frame) seems clear.
Maybe we are rediscovering why pymatgen splits these into two arguments?
I would argue for keeping these separate, but potentially renaming step_skip. If you run a MD simulation, these are separate inputs with different units. As a user, I always think about my output frequency in units of timesteps, not in units of simulation time ( = timestep × output frequency)
Based on the number of confused user questions about this problem (95 % of user problems have been around this), I think you might be alone in that @bjmorgan.
From @bjmorgan:
I agree with the principle that we just multiply these numbers together, but, this means the user needs to remember to multiply together two numbers from their MD input (which they have already shown they don’t understand) and put that number into kinisi. So the current approach you just use the numbers straight from the MD setup instead of doing the multiplication yourself as an extra step.
I think this is a good argument.
I agree that this is a good argument.
What do MD packages call these things?
LAMMPS calls these: timestep and N.
DLPOLY calls them: timestep and timestep interval.
VASP calls them: POTIM (or time step) and NBLOCK.
Returning to this … an additional comment would be that it is not uncommon to have datasets where the MD timestep is consistent, but the output frequency changes (maybe you started with a shorter output frequency, increased it later, and do not want to rerun or post-process your original trajectory data). With the existing separation of these two inputs, it is transparent how changes in MD set-up map to changes in kinisi input parameters. Combining these parameters obfuscates how the combined parameter is linked to the MD data. (output files may not report output frequency in time units, but instead will report step frequency, or leave the user to pull this information from the MD input).
I have an implementation of kinisi that uses scipp and therefore requires the user to set the values as:
time_step = 2.0 * sc.Unit('fs')
step_skip = 50 * sc.Unit('dimensionless')
I think that this would help with this confusion. But this won't be ready until kinisi version 2.
Documentation for scipp: https://scipp.github.io (xarray with units and variances).
In my opinion, requiring scipp makes usage more complicated, and only users who already understand the difference between MD timestep and write frequency will understand what this is doing.
The logic is that declaring the units forces the user to think more about what they are doing and, therefore, understand it.
We want everyone to understand the difference between MD timestep and write frequency.
And forcing units is aligned with the general principle for scientific code that you want your code to be explicit rather than implicit.
Yup. There are additional benefits to the use of scipp (e.g., generally explicit unit handling, objects that hold variances alongside the values, etc.)
This is being fixed in the scipp branch.