nwchem
nwchem copied to clipboard
emit better error messages when disk is full and EAF writes fail
This inp file:
title "x"
echo
geometry units angstroms
C -0.76198 1.17875 -0.00473
C 0.63084 1.25353 -0.00749
C 1.39201 0.08470 -0.00938
C 0.76036 -1.15891 -0.00853
C -0.63246 -1.23369 -0.00578
C -1.39363 -0.06486 -0.00388
H -1.35502 2.08940 -0.00326
H 1.12297 2.22244 -0.00815
H 2.47717 0.14296 -0.01153
H 1.35339 -2.06956 -0.01000
H -1.12459 -2.20260 -0.00511
H -2.47879 -0.12312 -0.00174
C 1.01744 2.82421 2.57039
C 2.26864 3.35365 2.25459
C 3.24586 2.53983 1.68163
C 2.97188 1.19657 1.42447
C 1.72068 0.66713 1.74027
C 0.74346 1.48095 2.31323
H 0.25607 3.45827 3.01679
H 2.48211 4.40020 2.45495
H 4.22069 2.95232 1.43559
H 3.73325 0.56251 0.97807
H 1.50721 -0.37942 1.53991
H -0.23137 1.06846 2.55927
end
basis
C library 6-311G*
H library 6-311G*
end
scf
thresh 0.01
end
task scf optimize
leads to the crash:
----------------------------------------------
Quadratically convergent ROHF
Convergence threshold : 1.000E-02
Maximum no. of iterations : 30
Final Fock-matrix accuracy: 1.000E-07
----------------------------------------------
Integral file = ./inp.aoints.0
Record size in doubles = 65536 No. of integs per rec = 32766
Max. records in memory = 0 Max. records in file = 793
No. of bits per label = 16 No. of bits per value = 64
eaf_write: rc ne bytes -1999 bytes 524288
eaf_write: rc ne bytes -1999 bytes 524288
IO offset 240123904.00000000
IO error message >Write Failed
IO offset 188219392.00000000
IO error message >Write Failed
eaf_write: rc ne bytes -1999 bytes 524288
eaf_write: rc ne bytes -1999 bytes 524288
IO offset 360710144.00000000
IO error message >Write Failed
nwchem-6.8.1.20190222 (rev. d8ac0a182) on FreeBSD 11.2 amd64, ga-5.7_4. Run on 8 CPUs using MPI (mpirun).
Add direct
to SCF input block.
direct
helped.
So, if integral recomputation isn't forced, some integrals end up being wrong?
What integrals were wrong in your first job? The I/O failed and the job crashed.
How can the I/O fail? There needs to be a specific reason with an error code. It runs on one machine so it's unlikely that I/O just fails.
HW I/O error or you ran out of disk space.
I haven't spent a lot of time looking at EAF but it's pretty clear that EAF is returning the error code -1999 for some reason. As Edo said, inadequate disk space is a likely cause.
eaf_write: rc ne bytes -1999 bytes 524288
eaf_write: rc ne bytes -1999 bytes 524288
IO offset 240123904.00000000
IO error message >Write Failed
IO offset 188219392.00000000
IO error message >Write Failed
eaf_write: rc ne bytes -1999 bytes 524288
eaf_write: rc ne bytes -1999 bytes 524288
IO offset 360710144.00000000
IO error message >Write Failed
You are right, the system log has 'disk full' errors at around this time. NWChem error messages aren't clear, and this causes confusion.
Thank you for clarifying this!
I will add a better error message for this.