crest
crest copied to clipboard
Cregen eliminates valid conformers as "clashes"
Hello,
While performing conformational search in crest (version 2.12, downloaded form github), it removes all structures as "clashes" even though they are completely fine by visual inspection.
Command which fails: crest struc.xyz --cregen crest_rotamers_2.xyz
Files (you must rename them from .xyz.txt to just .xyz, github did not allow me to upload xyz) crest_rotamers_2.xyz.txt struc.xyz.txt
The output is like this:
Command line input:
> crest struc.xyz --cregen crest_rotamers_2.xyz
--cregen : CREGEN standalone usage. Sorting file <crest_rotamers_2.xyz>
Using only the cregen sorting routine.
input file name : crest_rotamers_2.xyz
output file name : crest_rotamers_2.xyz.sorted
number of atoms : 21
number of points on xyz files : 4
RMSD threshold : 0.1250
Bconst threshold : 0.0100
population threshold : 0.0500
conformer energy window /kcal : 6.0000
# fragment in coord : 1
number of removed clashes : 4
# bonds in reference structure : 19
number of reliable points : 0
reference state Etot : 6.939238432338885E-310
running RMSDs...
done.
number of doubles removed by rot/RMSD : 0
total number unique points considered further : 0
Erel/kcal Etot weight/tot conformer set degen origin
T /K : 298.15
E lowest : 0.00000
ensemble average energy (kcal) : 0.000
ensemble entropy (J/mol K, cal/mol K) : 0.000 0.000
ensemble free energy (kcal/mol) : 0.000
population of lowest in % : 0.000
number of unique conformers for further calc 0
list of relative energies saved as "crest.energies"
-----------------
Wall Time Summary
-----------------
CREGEN wall time : 0h : 0m : 0s
--------------------
Overall wall time : 0h : 0m : 0s
CREST terminated normally.
I would like to note that this happened originally as a part of the whole conformational search, but I just narrowed it down to this particular step. The molecule was optimized with the following constraints, has charge -2, multiplicity 1 and was optimized with --alpb water
with xtb with --tight
optimization thresholds.
$constrain
force constant=1.0
angle: 2,1,4,90.0
angle: 2,1,6,90.0
angle: 2,1,8,180.0
angle: 4,1,6,180.0
angle: 4,1,8,90.0
angle: 6,1,8,90.0
$end
Yeah, that's a bit odd. I was able to reproduce it... initially. After repeated tries, which included renaming the ensemble, it started to work, and has been since, without me being able to break it again.
I suspect it has something to do with an uninitialized reference energy. If you look at your output it says 6.939238432338885E-310
for this value, which is something that happens only if there is random data (i.e. the variable is uninitialized) in memory. I'm not 100% sure this is what was causing trouble, but just in case I'll look at it in the code.
First of all, I think your later attempts to reproduce failed, because you may have inadvertently overwritten the struc.xyz
file, since crest overwrites file with that name when it is run.
I looked at the problem more, and noticed, that the "clashes" are not eliminated when you change the reference structure (when I took the first structure from the rotamers file, it went. I tried to actually print variables in discardbroken
subroutine by adding this to newcregen.f:612
: write (*,*) "__debug:",distok,dissoc,frag,frag0
and here they are:
unchanged_struc: number of removed clashes : 4
__debug: T T 2 1
__debug: T T 2 1
__debug: T T 2 1
__debug: T T 2 1
other_geometry_struc:no clashes
__debug: T F 2 2
__debug: T F 2 2
__debug: T F 2 2
__debug: T F 2 2
It means, that crest somehow thinks that the "new" structures are dissociated. If I undestand it correctly, once dissoc
is set to .true.
the structure is removed by the next if statement. I think this is not an easy problem to solve, metal-ligand bonds in general tend to be quite diverse.
I am quite surprised, that these "dissociation" checks are not just switched off when you put --notopo
option on the command line (but it has not effect). I can see a better way to check topology changes might be to create a bond order matrix base on Pauling's formula (10.1021/ja01195a024) bond_order = exp((R_single_bond - R)/0.3)
and then take difference between two bond matrices to see, if any differene is larger than one.
However, a solution not requiring to change the general procedure, which I suppose, is working well in general, since nobody else complains* about this besides me, would be just to make sure that discardbroken
subrouting would ignore the atoms excluded by --notopo
option.
(* the problem could be more widespread, since I only noticed it because I got zero structures. But if, by chance, some structures passed those check, it would be a "silent" error discarding correct structures)
Here is my "testing" script:
#!/usr/bin/env python3
from pathlib import Path
import shutil
import subprocess
d1=Path("unchanged_struc")
xyz1=Path("struc.xyz").read_text()
d2=Path("other_geometry_struc")
xyz2_lines=Path("crest_rotamers_2.xyz").read_text().split("\n 21")[0].split("\n")
xyz2_lines[1]="" #remove energy
xyz2="\n".join(xyz2_lines)
import re
for d,xt in ((d1,xyz1),(d2,xyz2)):
d.mkdir(exist_ok=True)
Path(d/"struc1.xyz").write_text(xt)
shutil.copy("crest_rotamers_2.xyz", d/"crest_rotamers_2.xyz")
print(f"{d}:",end="")
p=subprocess.run(["../crest/_build/crest","struc1.xyz","--notopo","--cregen","crest_rotamers_2.xyz","--notopo"],capture_output=True,cwd=d)
out_tx=p.stdout.decode()
m=re.search(".*clashes.*",out_tx)
print(m.group() if m else "no clashes")#out_tx.split("number of")[-1].split("\n")[0])
for i in re.findall("__debug:.*",out_tx):
print(i)
Path(d/"crest.out").write_text(out_tx)
The topology check is separate from checking for dissociation and already employs a connectivity matrix. With the other reference (first structure from the ensemble) similar issues may still persist and one structure will still be kicked out by the topology.
The dissociation check employs an older version of the "topology" based on covalent coordination numbers, which might cause additional issues here. Turning it off with --notopo
is a good idea and I will adapt this asap, but what really should be done is use a topology that can more consistently identify the Zn-C bonds. Unfortunately, metal centers often cause this trouble in the topology, and I don't think there is a 100% fail proof way to do that. Turning off the topology is usually the way to go in this case.
I attempted to fix it by using the mechanism to ignore some atoms what were already there for subRMSD
. Could something like this work? It seems to work for my structures. But I cannot program in Fortran, so I may have missed something.
diff --git a/src/newcregen.f90 b/src/newcregen.f90
index b391f3f..6a00599 100644
--- a/src/newcregen.f90
+++ b/src/newcregen.f90
@@ -550,10 +550,19 @@ subroutine discardbroken(ch,env,nat,nall,at,xyz,comments,newnall)
logical :: distok,distcheck
real(wp) :: cnorm
logical :: dissoc
+ logical,allocatable :: incl_atoms(:)
+ integer,allocatable :: incl_atoms_mask(:)
!--- if we don't wish to include all atoms:
substruc = (nat.ne.env%rednat .and. env%subRMSD)
+ if (allocated(env%excludeTOPO))then
+ substruc=substruc .or. any(env%excludeTOPO)
+ endif
+ if (substruc)then
+ incl_atoms=(env%includeRMSD>0).and.(.not.env%excludeTOPO)
+ incl_atoms_mask=merge(1,0,incl_atoms)
+ endif
!--- settings
allocate(rcov(94))
call setrcov(rcov)
@@ -569,9 +578,10 @@ subroutine discardbroken(ch,env,nat,nall,at,xyz,comments,newnall)
write(ch,'('' # fragment in coord :'',i6)')frag0
deallocate(atdum)
if(substruc)then
- nat0=env%rednat
+ nat0=count(incl_atoms)
+ write (*,*) "__debug Using these atoms in discardbroken:",incl_atoms
allocate(c0(3,nat0),at0(nat0),c1(3,nat0),atdum(nat0))
- call maskedxyz(nat,nat0,cref,c0,at,at0,env%includeRMSD)
+ call maskedxyz(nat,nat0,cref,c0,at,at0,incl_atoms_mask)
else
allocate(c0(3,nat),at0(nat),c1(3,nat),atdum(nat))
c0=cref
@@ -591,7 +601,7 @@ subroutine discardbroken(ch,env,nat,nall,at,xyz,comments,newnall)
if(.not.substruc)then
c1(:,:)=xyz(:,:,j)/bohr
else
- call maskedxyz(nat,nat0,xyz(:,:,j),c1,at,at0,env%includeRMSD)
+ call maskedxyz(nat,nat0,xyz(:,:,j),c1,at,at0,incl_atoms_mask)
c1=c1/bohr
endif
distok=distcheck(nat0,c1) ! distance check
@@ -607,6 +617,7 @@ subroutine discardbroken(ch,env,nat,nall,at,xyz,comments,newnall)
endif
endif
+ write (*,*) "__debug Number of fragments/reference:",frag,"/",frag0
! if(dissoc .or. (abs(erj).lt.1.0d-6) .or. &
if(dissoc .or. &
& (cnorm.lt.1.0d-6) .or. (.not.distok) )then
@@ -619,6 +630,11 @@ subroutine discardbroken(ch,env,nat,nall,at,xyz,comments,newnall)
orderref(j)=newnall
endif
enddo
+ if(allocated(incl_atoms)) deallocate(incl_atoms_mask)
+ if(allocated(incl_atoms_mask)) deallocate(incl_atoms_mask)
+ if (newnall.eq.0)then
+ error stop "Error: function discardbroken eliminated all structures"
+ endif
!--- sort the xyz array (only if structures have been discarded)
if(newnall.lt.nall)then
The dissociation check should be turned off entirely if --notopo
is specified, as you suggested earlier. As of #250 this is done
I am facing a similar problem with a metal-organic system: a few good conformers are discarded as "clashes".
More. I tried to run cregen (standalone and with default parameters except adding ---notopo) on an ensemble of unique conformers produced by CREST itself and..... some conformers are discarded as "clashes"! This is meaningless. Isn't CREST using exactly the same filtering procedure?
Even more. Runs with exactly the same parameters can give different results and sometimes no conformers are discarded at all! This is puzzling. I attach the output files of 20 consecutive runs. Some explanation:
"crest_conformersall.xyz" contains the 23 conformers to be sorted; they were obtained by conformational sampling using CREST and starting from "CoLpysorted_preopt.xyz", which is also used as the reference structure in cregen (by the way, why is this needed in combination with --notopo?)
"cregen_fpnew.sh" is the shell file used to launch the 20 identical runs.
You will see that sometimes 3 conformers are discarded as clashes, sometimes they are not.... What's wrong?
This is another example on a much bigger ensemble (1575 conformers). Again, two different outputs are obtained in consecutive runs:
cregen1.out, with 99 unique conformers and 358 removed "clashes"; cregen2.out, with 125 unique conformers and no "clashes".
This example demonstrates that the underlying problem is either "on" or "off", as it affects a very large number of conformers at the same time.
@andreacornia
CREGEN is deterministic. This means if your input is the same, something else is going on that shouldn't. Checking your cregen_b.zip shows what this is:
If you look at cregen1.out, right after repeating the command, CREST says it recognized the line:
-notopo <x> : ignoring topology on 1 atoms (Co)
while this is missing in cregen2.out.
Obviously you didn't specify Co in the shell script, but for some (bash-related, maybe?) reason the program thinks so. Consequently, CREGEN then thinks it has topology except for Co in the first calculation, while it ignores all topology in the second. This is what you are observing.
I don't understand why the argument parser is trying to interpret seemingly random trailing bits as "Co" or 1, there is obviously nothing there in the command line. An easy "fix" for now would be to move -notopo
in front of the --cregen
command in your command line. I will investigate if it can be caught in-code.
by the way, why is this needed in combination with --notopo?
Because CREST needs initialization of its environment (system size etc.), which happens formally before parsing the --cregen
command.
Great! I moved -notopo as you suggested and this solved the problem. Conformers are no longer removed as clashes! Many thanks!