colvars icon indicating copy to clipboard operation
colvars copied to clipboard

Crashing while adding a new hill in multiple walkers metadynamics - GROMACS

Open savenkom opened this issue 7 months ago • 6 comments

I am running a well-tempered metadynamics with an attempt to do it with multiple walkers. Unfortunately it fails at the moment first walker tries to deposit first hill. As a test case on my local machine I run only one walker (I suppose it's a realistic case, since it's technically possible that some walker will start way ahead of others), other walkers just stay as empty directories with all files. I am using set-up with replicas being independent jobs. As I see in the log file, all necessary things were read: colvars: Metadynamics bias "mtd": accessing replica "1". colvars: Metadynamics bias "mtd": accessing replica "2". colvars: Metadynamics bias "mtd": accessing replica "3". colvars: Metadynamics bias "mtd": accessing replica "4". colvars: Metadynamics bias "mtd": accessing replica "5". colvars: Metadynamics bias "mtd": accessing replica "6". colvars: Metadynamics bias "mtd": accessing replica "7". colvars: Metadynamics bias "mtd": replica "1" has supplied a new state file, "walk_1/md.colvars.mtd.1.state". colvars: Metadynamics bias "mtd": replica "2" has supplied a new state file, "walk_2/md.colvars.mtd.2.state". colvars: Metadynamics bias "mtd": replica "3" has supplied a new state file, "walk_3/md.colvars.mtd.3.state". colvars: Metadynamics bias "mtd": replica "4" has supplied a new state file, "walk_4/md.colvars.mtd.4.state". colvars: Metadynamics bias "mtd": replica "5" has supplied a new state file, "walk_5/md.colvars.mtd.5.state". colvars: Metadynamics bias "mtd": replica "6" has supplied a new state file, "walk_6/md.colvars.mtd.6.state". colvars: Metadynamics bias "mtd": replica "7" has supplied a new state file, "walk_7/md.colvars.mtd.7.state". colvars: Metadynamics bias "mtd": reading the state of replica "1" from file "walk_1/md.colvars.mtd.1.state". colvars: successfully read the biasing potential and its gradients from grids. colvars: successfully read 0 explicit hills from state. colvars: Restarted metadynamics bias "mtd" with step number 0. colvars: Metadynamics bias "mtd": reading the state of replica "2" from file "walk_2/md.colvars.mtd.2.state". colvars: successfully read the biasing potential and its gradients from grids. colvars: successfully read 0 explicit hills from state. colvars: Restarted metadynamics bias "mtd" with step number 0. colvars: Metadynamics bias "mtd": reading the state of replica "3" from file "walk_3/md.colvars.mtd.3.state". colvars: successfully read the biasing potential and its gradients from grids. colvars: successfully read 0 explicit hills from state. colvars: Restarted metadynamics bias "mtd" with step number 0. colvars: Metadynamics bias "mtd": reading the state of replica "4" from file "walk_4/md.colvars.mtd.4.state". colvars: successfully read the biasing potential and its gradients from grids. colvars: successfully read 0 explicit hills from state. colvars: Restarted metadynamics bias "mtd" with step number 0. colvars: Metadynamics bias "mtd": reading the state of replica "5" from file "walk_5/md.colvars.mtd.5.state". colvars: successfully read the biasing potential and its gradients from grids. colvars: successfully read 0 explicit hills from state. colvars: Restarted metadynamics bias "mtd" with step number 0. colvars: Metadynamics bias "mtd": reading the state of replica "6" from file "walk_6/md.colvars.mtd.6.state". colvars: successfully read the biasing potential and its gradients from grids. colvars: successfully read 0 explicit hills from state. colvars: Restarted metadynamics bias "mtd" with step number 0. colvars: Metadynamics bias "mtd": reading the state of replica "7" from file "walk_7/md.colvars.mtd.7.state". colvars: successfully read the biasing potential and its gradients from grids. colvars: successfully read 0 explicit hills from state. colvars: Restarted metadynamics bias "mtd" with step number 0. colvars: Synchronizing (emptying the buffer of) trajectory file "md.colvars.traj".

To make start of the simulation be possible I was in need to create for every walker md.colvars.mtd.replicaID.state myself, otherwise it was crashing immediately. I just copied file created in replica 0 to other directories, renamed them and change inside the replica ID to the appropriate number.

As I said, problem is somewhere in the hill deposition, I suppose, since running GDB indicates that problem is somewhere in colvarbias_meta::update_bias() [clone .cold]

Moreover, I run the same simulation without multiple walkers (just erased those lines from metadynamics and didn't change anything else) and it was running well and deposing hills just fine.

My colvars setup file looks like that:

indexFile index.ndx colvarsRestartFrequency 1000000 colvarsTrajFrequency 100 colvar { name angle width 2.0 lowerBoundary -180.0 upperBoundary 180.0 spinAngle { atoms { indexGroup strand } refPositionsFile all.xyz axis (1.0, 0.0, 0.0) } }

colvar { name dist lowerBoundary 1.4 upperBoundary 5.2

width 0.1
distanceZ {
main {indexGroup Quat }
ref {dummyAtom (0, 0, 0)}
axis (1, 0, 0)
}

}

###################RESTRAINS######################

colvar { name rmsd rmsd { atoms { indexGroup tail rotateToReference off } refPositionsFile all.xyz }

}

harmonicWalls { name wall_max colvars dist upperWalls 4.6 upperWallConstant 1000.0 #stepZeroData on }

harmonicWalls { name wall_min colvars dist lowerWalls 2.0 lowerWallConstant 1000.0

}
harmonic { colvars rmsd centers 0.0 forceConstant 10000.0 outputCenters on

} ######################################################

metadynamics { name mtd useGrids on colvars angle dist newHillFrequency 2000 hillWeight 0.5 hillWidth 1.0 keepFreeEnergyFiles on writeHillsTrajectory on wellTempered on biasTemperature 2790 outputFreq 5000000 multipleReplicas on replicasRegistry replicasRegistry.txt replicaUpdateFrequency 10000 replicaID 0 }

I hope it will be possible to find the source of the issue. Thank you in advance

savenkom avatar May 21 '25 11:05 savenkom

I wonder if this bug is related to https://github.com/Colvars/colvars/pull/808.

HanatoK avatar Jun 10 '25 20:06 HanatoK

I wonder if this bug is related to #808.

It's possible, but I'm not sure that's the most likely explanation: the error here happens immediately, and I would assume that the grid is wide enough to accommodate at least the starting configuration.

@savenkom Thanks for your report, and for your patience. I have not been able to reproduce this error so far using GROMACS 2025. Can you please report the GROMACS version that you are using (and the Colvars version, if it's a patched build) and include the shell scripts that you use to launch the first job (using the TPR alone) and the continuation job (using the TPR and checkpoint together)?

I suspect that the way we detect the output file prefix in Gromacs might be at fault here.

giacomofiorin avatar Jun 11 '25 01:06 giacomofiorin

I was using GROMACS 2024.3 and Colvars version indicated in md.log is the 2023-12-04 (patch 2). This set-up is coming directly from spack and default installation and I added nothing from myself on top. gmx_mpi mdrun -deffnm md was my launching command for the walker. I didn't do continuation launch because crash was happening before cpt file was saved. Gromacs definitely didn't have problem by itself to find necessary files and Colvars had no problems to find registry and read it.

I hope I was able to give you information you expected. Thank you very much for your help!

savenkom avatar Jun 11 '25 09:06 savenkom

Thanks a lot, I'll test -deffnm, which is something that we hadn't tested universally given the plans from the Gromacs team to phase it out in the future.

What was the error message you received?

On Wed, Jun 11, 2025 at 5:24 AM savenkom @.***> wrote:

savenkom left a comment (Colvars/colvars#804) https://github.com/Colvars/colvars/issues/804#issuecomment-2961895514

I was using GROMACS 2024.3 and Colvars version indicated in md.log is the 2023-12-04 (patch 2). This set-up is coming directly from spack and default installation and I added nothing from myself on top. gmx_mpi mdrun -deffnm md was my launching command for the walker. I didn't do continuation launch because crash was happening before cpt file was saved. Gromacs definitely didn't have problem by itself to find necessary files and Colvars had no problems to find registry and read it.

I hope I was able to give you information you expected. Thank you very much for your help!

— Reply to this email directly, view it on GitHub https://github.com/Colvars/colvars/issues/804#issuecomment-2961895514, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCNSWIJZ6BNQO4FPP6ZQF33C7YWTAVCNFSM6AAAAAB5TCA23CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNRRHA4TKNJRGQ . You are receiving this because you were assigned.Message ID: @.***>

giacomofiorin avatar Jun 11 '25 18:06 giacomofiorin

Sorry for answering with delay. The error itself was indicated as terminate called after throwing an instance of 'gmx::InternalError' what(): Error in collective variables module.

If you backtrace with GDB shows this: #0 __pthread_kill_implementation (threadid=, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007ffff28a7813 in __pthread_kill_internal (threadid=, signo=6) at pthread_kill.c:89 #2 0x00007ffff284ddc0 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007ffff283557a in __GI_abort () at abort.c:73 #4 0x00007ffff2a97b0c in __gnu_cxx::__verbose_terminate_handler () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/vterminate.cc:95 #5 0x00007ffff2aadf1a in __cxxabiv1::__terminate (handler=) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48 #6 0x00007ffff2a9737b in __cxa_call_terminate (ue_header_in=0x7fffb01ef540) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_call.cc:56 #7 0x00007ffff2aad703 in __cxxabiv1::__gxx_personality_v0 (version=, actions=6, exception_class=5138137972254386944, ue_header=0x7fffb01ef540, context=0x7fffd90964e0) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_personality.cc:692 #8 0x00007ffff31f4ee7 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7fffb01ef540, context=context@entry=0x7fffd90964e0, frames_p=frames_p@entry=0x7fffd90963e8) at /usr/src/debug/gcc/gcc/libgcc/unwind.inc:64 #9 0x00007ffff31f59ad in _Unwind_Resume (exc=0x7fffb01ef540) at /usr/src/debug/gcc/gcc/libgcc/unwind.inc:242 #10 0x00007ffff34fee85 in colvarbias_meta::update_bias() [clone .cold] () from /home/spack/opt/spack/linux-arch-broadwell/gcc-12.3.0/gromacs-2024.3-lb4a4fud6nydgygcy3dp7b5y3cwxsvc4/lib/libgromacs_mpi.so.9 #11 0x00007ffff443578c in colvarbias_meta::update() () from /home/spack/opt/spack/linux-arch-broadwell/gcc-12.3.0/gromacs-2024.3-lb4a4fud6nydgygcy3dp7b5y3cwxsvc4/lib/libgromacs_mpi.so.9 #12 0x00007ffff451a320 in colvarproxy_smp::smp_biases_loop() [clone ._omp_fn.0] () from /home/spack/opt/spack/linux-arch-broadwell/gcc-12.3.0/gromacs-2024.3-lb4a4fud6nydgygcy3dp7b5y3cwxsvc4/lib/libgromacs_mpi.so.9 #13 0x00007ffff7f8a997 in gomp_thread_start (xdata=) at /usr/src/debug/gcc/gcc/libgomp/team.c:129 #14 0x00007ffff28a57eb in start_thread (arg=) at pthread_create.c:448 #15 0x00007ffff292918c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

savenkom avatar Jun 16 '25 14:06 savenkom

Hi @savenkom I have been trying to run MW metadynamics in GROMACS, but could not easily reproduce your error. In #814 I added an automated test (which was missing anyway) and made some minor changes that could improve the error messages.

Here are some suggestions:

  1. Can you check that the grid boundaries 1.4 and 5.2 are valid for the value of dist in your initial configuration?
  2. Alternatively, can you test removing wellTempered and biasTemperature (the bug fixed in #808 was specific to them)?
  3. If these two don't show anything useful, at this point can you send a full input deck as a link or offline by email?

Thanks!

giacomofiorin avatar Jul 02 '25 20:07 giacomofiorin