colvars icon indicating copy to clipboard operation
colvars copied to clipboard

Finally remove support for appending to files?

Open giacomofiorin opened this issue 2 years ago • 6 comments

@HanatoK Appending to files that already exist on disk is useful in some cases, but also rather fragile (for examples, when jobs crash and are restarted). @jhenin had already removed this feature from ABF (see #136 and 1e964a542bf481b29cfd33ee4097dc6ca3371bda) but before I could remove the append flag from the file-writing functions you started again using it in #409, for example here: https://github.com/Colvars/colvars/blob/ce43c1d81d28efb96a9b3a3a8142dec4fac2f4e2/src/colvarbias_histogram_reweight_amd.cpp#L187-L188

This results in (1) appending between multiple NAMD jobs even if some may have not exited cleanly leading to corrupt files and (2) it is also inconsistent with what the ABF bias does in the equivalent situation. Lastly, (3) supporting the append mode means an extra code path to maintain.

I would like to change those lines in the reweightaMD bias so that no files are opened any longer with the std::app flag: instead, if the append variable is set, the call to close_output_stream() would simply be skipped.

If you think of a strong reason to keep support for appending to files (i.e. one that clearly overtakes the three concerns listed above), please bring it up. Thanks!

giacomofiorin avatar Jun 22 '22 14:06 giacomofiorin

@HanatoK @jhenin Here is a working draft where I extended the requests that you brought up in #481 to the handling of output streams. https://github.com/Colvars/colvars/tree/output_streams

I still need to sort out test failures in multiple-walkers metadynamics, and may actually opt for not touching that at all (since the temporary files are actual files anyway). Other than that, everything seems to work with references to std::ostream objects rather than pointers, plus some nice functions for writing out the grid-based objects.

The main change that affects you in that branch is that I removed any instances of files being opened in append-mode. For "history" files, this means that you can still accumulate them during a job, but not across jobs, consistent with NAMD's DCD and XST files.

I'd like to have all of these changes merged by mid-July, and at this time the only thing I need is confirmation that this is an acceptable change.

giacomofiorin avatar Jun 23 '22 18:06 giacomofiorin

@giacomofiorin If the support of appending is removed, how can we keep the history files?

HanatoK avatar Jun 23 '22 18:06 HanatoK

If you are using NAMD, I assume that you are using different outputName prefixes, otherwise your DCD trajectories would be overwritten (NAMD does not accept appending).

So just like you would use catdcd to concatenate DCD files, you would also concatenate ABF history files:

catdcd -o total.dcd job1.dcd job2.dcd ...
cat job1.pmf.hist job2.pmf.hist ... > total.pmf.hist

giacomofiorin avatar Jun 23 '22 18:06 giacomofiorin

If you are using NAMD, I assume that you are using different outputName prefixes, otherwise your DCD trajectories would be overwritten (NAMD does not accept appending).

So just like you would use catdcd to concatenate DCD files, you would also concatenate ABF history files:

catdcd -o total.dcd job1.dcd job2.dcd ...
cat job1.pmf.hist job2.pmf.hist ... > total.pmf.hist

Thanks, I see it. I am OK with the removal of append.

HanatoK avatar Jun 23 '22 19:06 HanatoK

Ok, cool. Please confirm with Chris C when you can, so that we are all on the same page.

As far as I can tell all output files in both the UI-based code and the aMD-based code reuse the same output prefix as everything else, and this prefix in turn must change between jobs if you want to avoid overwriting atomic trajectory files in NAMD. Still, I can appreciate why in certain cases, like tiny test systems, one may ignore the atomic trajectory altogether and end up reusing the output prefix. The behavior in those corner cases would change without appending, so I'd like to be sure.

You can probably also see why this is a thorny issue that should be resolved: in GROMACS, it is possible to use the same output prefix between consecutive jobs, because all of its output files are backed up consistently. The same already happens with all Colvars output files, which means that the few cases that don't follow that behavior would end up being really confusing to people.

giacomofiorin avatar Jun 23 '22 19:06 giacomofiorin

For reference, this is connected to PR #487

jhenin avatar Sep 01 '22 20:09 jhenin

I think getting consistent behavior within Colvars should take precedence over preserving edge-case scenarios. Here, I would just move forward with this change. At worst, this might function as a scream test.

jhenin avatar Nov 11 '22 20:11 jhenin

@jhenin Indeed #487 is making the change, replacing the append flag with a keep_open flag where possible.

You may have also seen this. At this point, this issue has been discussed enough.

giacomofiorin avatar Nov 13 '22 22:11 giacomofiorin

Closed by #487

giacomofiorin avatar Apr 28 '23 16:04 giacomofiorin