colvars icon indicating copy to clipboard operation
colvars copied to clipboard

Too many backup files in GROMACS when colvarsRestartFrequency is small

Open HanatoK opened this issue 4 months ago • 10 comments

I was using colvarsRestartFrequency 50000 to run a simulation of 80000000 steps in GROMACS with Colvars, and got the following error:

-------------------------------------------------------
Program:     gmx mdrun, version 2025.2
Source file: src/gromacs/utility/futil.cpp (line 357)

Fatal error:
Won't make more than 99 backups of 007_r.out.abf1.count for you.
The env.var. GMX_MAXBACKUP controls this maximum, -1 disables backups.

For more information and tips for troubleshooting, please check the GROMACS
website at https://manual.gromacs.org/current/user-guide/run-time-errors.html
-------------------------------------------------------

It looks like that Colvars uses the internal backup mechanism of GROMACS, so it would be better to document the behavior of using a small colvarsRestartFrequency.

HanatoK avatar Aug 28 '25 15:08 HanatoK

It looks like I cannot simply disable colvarsRestartFrequency if I use historyfreq in the ABF section. Any ideas?

HanatoK avatar Aug 28 '25 16:08 HanatoK

It looks like I cannot simply disable colvarsRestartFrequency if I use historyfreq in the ABF section. Any ideas?

Perhaps we can disable the backup in those use cases where it is clear from the configuration that the user will never make use of those backup files?

giacomofiorin avatar Aug 30 '25 16:08 giacomofiorin

There should be a backup rotation mechanism. When you have more backups than allowed, delete the old ones... Anyway, we have no control over this.

Maybe we should decouple these frequencies completely: output, backup, and history?

@giacomofiorin What situation do you see as a sign that backup files will never be used?

jhenin avatar Aug 30 '25 19:08 jhenin

@giacomofiorin What situation do you see as a sign that backup files will never be used?

In this case, because the history file is being written, there should be no need to keep the same information on disk by backing up the original .grad and .count file. Or is there?

giacomofiorin avatar Sep 01 '25 15:09 giacomofiorin

There should be a backup rotation mechanism. When you have more backups than allowed, delete the old ones... Anyway, we have no control over this.

Maybe we should decouple these frequencies completely: output, backup, and history?

@giacomofiorin What situation do you see as a sign that backup files will never be used?

The history files are necessary to access the eABF convergence, and I don't use them for restarting. It is better for me if the frequency of saving backup files can be larger than the frequency of saving history. How difficult is it to decouple the frequencies?

HanatoK avatar Sep 01 '25 16:09 HanatoK

Not too difficult, either write_output_files has to be called in all cases and check both frequencies internally to see which files need writing, or the history writing has to be made into its own function and called at a separate frequency.

jhenin avatar Sep 01 '25 17:09 jhenin

Alternate solution: a user option to disable backups, or revert to Colvars-style backups

jhenin avatar Oct 14 '25 11:10 jhenin

This problem appeared when MR https://gitlab.com/gromacs/gromacs/-/merge_requests/4241 was done to fix this issue: https://gitlab.com/gromacs/gromacs/-/issues/5071

Adding a user flag would give users the choice between facing one issue or the other. I'm not sure it is possible to solve both issues at the same time.

jhenin avatar Oct 14 '25 13:10 jhenin

Thinking more about the above, how about relaxing the criterion that historyFreq has to be a multiple of outputFreq?

Then we wouldn't need to worry about the backup at all

giacomofiorin avatar Dec 09 '25 23:12 giacomofiorin

Yes, I think that is the best strategy. That is what Haochuan suggested back in September.

jhenin avatar Dec 09 '25 23:12 jhenin