qcmaquis
qcmaquis copied to clipboard
MKL DSYEVD error when running with twosite_truncation=heev
Hi,
I've performed a DMRG-CASCI(18,18)/cc-pVDZ computation for the tetracene molecule, where (18,18) is just the active space containing Pi bonding and anti-bonding orbitals. Using GVB orbitals as the initial guess (orbital shape similar to Pipek-Mezey localized orbtials), I've compared the results from Block and QCMaquis:
QCMaquis: different values of nsweeps
are tested
nsweeps = 5, E = -688.897193 a.u.
nsweeps = 9, E = -688.897265 a.u.
nsweeps = 15, E = -688.897297 a.u.
Block: -688.899740 a.u.
max_bond_dimension = 1000 is used among all calculations. It seems the DMRG-CASCI energy of QCMaquis slowly becomes lower with the increase of nsweeps
. Is there any option or keyword to accelerate the convergence (e.g. orbital ordering, do not canonicalize localized orbitals, etc)?
The OpenMolcas input file is attached tetracene_cc-pVDZ.zip
Thanks for any suggestion!
There are several ways to accelerate convergence in QCMaquis, one recommended way is to use the Fiedler orbital ordering and CI-DEAS. To enable them in the OpenMolcas interface, you may use the Fiedler
and CIDEAS
keywords of the DMRGSCF
module (see https://molcas.gitlab.io/OpenMolcas/sphinx/users.guide/programs/dmrgscf.html). Additional possibility is to use the perturbative correction in the first several sweeps. This can be achieved e.g. with the following QCMaquis input (to be added to the RGInput
...EndRG
or DMRGSettings
...EndDMRGSettings
block in OpenMolcas):
nsweeps = 10
ngrowsweeps = 2
nmainsweeps = 3
alpha_initial = 0.0005
alpha_main = 1e-5
alpha_final = 0
twosite_truncation = heev
Thanks for your help @kommerck .
I tried some options with a fixed nsweeps = 9:
E = -688.897265 a.u. (using &RASSCF
and RGinput
)
E = -688.897250 a.u. (using &DMRGSCF
)
E = -688.897254 a.u. (using &DMRGSCF
and Fiedler = ON
)
These energies differ little. When I tried the perturbative correction, an Intel MKL error occurred
Intel MKL ERROR: Parameter 10 was incorrect on entry to DSYEVD.
We can speculate the error is due to an improper SVD on a matrix, but I do not know how to solve the problem. Or, if there is any other suggestion?
Files are attached. Many thanks. tetracene_perturb.zip
Unfortunately I cannot reproduce the intel MKL error, your input runs fine for me. Have you compiled OpenMolcas/QCMaquis with ILP64 MKL interface?
Also using Fiedler=ON
and perturbative correction (both at the same time), I get an energy of -688.8997325 a.u. after only two sweeps.
Sorry for the delayed feedback. Yes, the OpenMolcas/QCMaquis is compiled with ILP64 MKL interface. I conclude this from
ldd rasscf.exe | grep 'lp'
ldd dmrgscf.exe | grep 'lp'
the results are
libmkl_gf_ilp64.so => /opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_gf_ilp64.so (0x00002af8a5e60000)
libalps.so => /home/jxzou/software/OpenMolcas_q/bin/./../qcmaquis/lib/libalps.so (0x00002af8ac526000)
Then I thought maybe the version of GCC matters, or a re-compilation might solve the MKL error. However, the same error occurs after I tried these boring things. On the other hand, using no perturbative correction, and nsweeps = 40, the energy is -688.897320 a.u.
Could you please tell me your versions of GCC, GSL, HDF5, BOOST, Intel MKL, OpenMolcas and QCMaquis? I want to take a try using your versions. I think maybe versions of MKL or QCMaquis matters.
We test our setup with several Docker images, and so far I'm afraid I was not able to reproduce this issue. Which distribution and versions do you have? This way I could fire up a Docker image and check if I can reproduce it. However, perhaps it's better to open a corresponding OpenMolcas issue re compilation and the error.
Thanks! I've opened an issue in OpenMolcas GitLab, and showed details of my compilation.
Thank you. I used the same input file in tetracene_perturb.zip. All versions of packages are the same as described in 278, the calculation is run on the same node. The only difference is this time I specify LINALG=Internal
.
And I downloaded the lapack-3.9.0.tar.gz and unzip it into External/lapack/
. If I did not do that, this directory is empty and compilation of OpenMolcas will result
CMake Error at CMakeLists.txt:1861 (message):
LAPACK+BLAS sources not available, run "/usr/bin/git submodule update --init /home/jxzou/software/OpenMolcas_q1/External/lapack"
But my node cannot access to the Internet. So I manually downloaded lapack-3.9.0.tar.gz and unzip it into External/lapack/
. After successful compilation, running ldd dmrgscf.exe|grep lp
leads to
libalps.so => /home/jxzou/software/OpenMolcas_q1/bin/./../qcmaquis/lib/libalps.so (0x00002b73f0d8f000)
And ldd dmrgscf.exe|grep mkl
leads to
/opt/intel/mkl/lib/intel64/libmkl_rt.so (0x00002ba5b19cb000)
So I supposed LINALG=Internal
worked. Then the DMRG-CASCI(18,18) energy is -688.897228 a.u., which is still 2 mH higher. Adding Fiedler=ON
leads to -688.897226. I've uploaded the output file, which may do some help.
tetracene_perturb1.zip
Sorry for the lengthy descriptions.
With modifying your OpenMolcas input after Gateway/Seward
to
&DMRGSCF
ActiveSpaceOptimizer=QCMaquis
Fiedler=ON
OOptimizationSettings
Charge = 0
Spin = 1
RAS2 = 18
nActEl= 18 0 0
FILEORB = tetracene_cc-pVDZ_uhf_gvb42_2CASCI.INPORB
CIonly
EndOOptimizationSettings
DMRGSettings
conv_thresh = 1E-7
max_bond_dimension = 1000
nsweeps = 6
ngrowsweeps = 2
nmainsweeps = 3
alpha_initial = 0.001
alpha_main = 1e-4
alpha_final = 0
twosite_truncation = heev
EndDMRGSettings
I get an energy of -688.8997412270 a.u. after 6 sweeps. Please try this and let me know if you get the same energy.
Thanks. I copy your input and submit two jobs. For the LINALG=MKL
version, it leads to the same MKL DSYEVD error. While for the LINALG=Internal
version, the result is strange
Fiedler orbital ordering: 9,10,6,13,3,5,16,14,1,18,1
terminate called after throwing an instance of 'std::runtime_error'
what(): Number of orbitals in the orbital order does not match the total number of orbitals
Program received signal SIGABRT: Process abort signal.
Maybe this is a truncated line? Files are attached. tetracene_perturb2.zip
Are you using the latest QCMaquis version? Your output shows QCMaquis version 3.0.1, whereas we are at 3.0.3.
Yes, I used QCMaquis 3.0.1, as I said in 278. I'll try 3.0.3.
Hi, QCMaquis-3.0.3 works excellent! By using your recommended input,
for LINALG=Internal
, I got -688.899738 a.u. within 6 nsweeps (cost 1h 55min);
for LINALG=MKL
, keeping the perturbative correction still leads to MKL DSYEVD error. But remove the perturbative correction, I got -688.899741 a.u. within 6 nsweeps (cost 26min).
I'll use QCMaquis >= 3.0.3, no perturbative correction and LINALG=MKL
for OpenMolcas in the future.
By the way, anything updated in QCMaquis-3.0.3 concerning MKL DSYEVD?
Which distribution are you using? So far I could not reproduce that error (I know you listed your software version in the OpenMolcas issue, but I'm interested specifically in the distribution so that I can fire up a Docker image to test it). The DSYEVD call in question is wrapped by Boost numeric bindings, which we provide as part of the ALPS/Boost distribution, so I cannot immagine they could be doing something wrong.
Oh, I just realize that maybe you are asking me the Linux distribution. It's CentOS 7.4.1708. More specifically, the result of command cat /proc/version
is
Linux version 3.10.0-693.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2017
The result of command lsb_release -a
is
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core