NJOY2016 icon indicating copy to clipboard operation
NJOY2016 copied to clipboard

Sporadic errors processing Gd-157 from JEFF-4T1

Open andrewmholcomb opened this issue 3 years ago • 9 comments

64-gd-157g.txt 64-gd-157g.njoy-input.txt

Bonjour mes amis,

I'm having some sporadic failures with processing the attached Gd-157 file using NJOY with the attached input. The code seems to hang or segfault (or run through depending on which computer I use) in the covr step. This file is taken from one of the JEFF-4 beta releases. The only thing that stands out to me is that the covariance is using LRF=7. In the JEFF-4 beta, there are only 5 isotopes with LRF=7 covariance data. O-16, Cl-35, Rh-103, Gd-155, Gd-157. Gd-155 experiences the same failure, Cl-35 and Rh-103 process without issue, and O-16 finishes but takes an inordinate amount of time.

I've compiled NJOY from source on Ubuntu-20.04 using cmake version 3.16.3 and gnu 9.4.0.

I'm not sure what is going wrong, but adding print statements in the code changed the failure mode so I suspect it is some sort of painful memory defect.

Let me know if you need any more info or can point out what I've done wrong.

Thanks!!

andrewmholcomb avatar Nov 15 '22 08:11 andrewmholcomb

Additional info about the configuration, the cmake command I used is cmake -DCMAKE_BUILD_TYPE:STRING=RELEASE /path/to/njoy/source I've also tried cmake -DCMAKE_BUILD_TYPE:STRING=RELWITHDEBINFO -DCMAKE_Fortran_FLAGS:STRING=-finit-local-zero /path/to/njoy/source but it still segfaults.

I also tried increasing the size of xval and icon in covr.f90 (and adjusted nvmax and ncmax to match) but the problem persists. The first array bound that gets violated is on this line

andrewmholcomb avatar Nov 15 '22 10:11 andrewmholcomb

Lastly, here is the valgrind output for the entire job. dump.txt

Executed with valgrind version 3.15.0 valgrind --vgdb=full --keep-debuginfo=yes --xml=yes --xml-file=dump.xml /home/njoy2016/bin/njoy < 64-gd-157g.njoy-input

andrewmholcomb avatar Nov 15 '22 10:11 andrewmholcomb

Bonjour le Parisien,

I believe your issue(s) are related to the compiler GNU and its version. For some time now when NJOY2016 is compiled with GNU above 8.3 and till 12.0, even the errorr compilation would fail on some platform, did you run make tests for your installation? did all tests passed? Partly related to #211 that by the way will force you to move to NJOY2016.68 (I see 65 written in your input) the latest October 5 release or use intel ifort instead, it also is a good free compiler from Intel oneAPI

Please keep in mind that the ENDF-6 format manual specification to read an LRF=7 covariance file (written only by SAMMY to my knowledge) has evolved with time, does all LRF=7 MF-32 also did accordingly, does errorr also followed?

jchsublet avatar Nov 15 '22 13:11 jchsublet

Bonjour Monsieur Sublet,

All NJOY tests pass, and the error is the same whether using NJOY2016.68 or NJOY2016.65. Equivalent inputs work for all of the JEFF-4T1 isotopes except for Gd-155 and Gd-157.

andrewmholcomb avatar Nov 15 '22 15:11 andrewmholcomb

Hi Andrew, Jean-Christophe,

There is a compiler issue with GCC 11 that leads to an internal compiler error when compiling ERRORR. The issues Andrew is describing do not sound related to this issue. JC's other comments of LRF7 evolving with time are more than likely the issue here. For instance, if MF32 is not present but LRF7 is used (as in ENDF/B-VIII.0's Fe-54), the current release of NJOY will crash (but we have a fix that will be released very soon).

We'll have to look into these new JEFF evaluations. Failures in COVR surprise me, though, as all the heavy lifting should be happening in ERRORR.

P.S. Please forgive my lack of French skills :smile:

nathangibson14 avatar Nov 15 '22 15:11 nathangibson14

I am also surprised! Let me know if there's anything I can do or information to report that will help!

andrewmholcomb avatar Nov 15 '22 16:11 andrewmholcomb

Hi Nathan, Andrew

Andrew's deck with the above Gd157 file work with NJOY2016.68 when compiled with Intel ifort 2021 (Bob's favorite) on macOS Monterey outGd157.txt but also when compiled with GNU 12.1.0 output.txt on the same platform

Cela me laisse sans voie: speechless in Twain

jchsublet avatar Nov 15 '22 17:11 jchsublet

I was able to get our NJOY2016.65 version to run after building with ifort instead of gfortran . I haven't checked that the results are good or bad but at least the job finishes instead of a segfault or infinite hang.

For posterity, I installed by downloading the offline version from here and followed the installation steps from here. For ifort to be found after installation this way, you may also need to source /opt/intel/oneapi/setvars.sh.

ifort --version
ifort (IFORT) 2021.7.1 20221019
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

configured with cmake -DCMAKE_BUILD_TYPE:STRING=RELEASE -DCMAKE_Fortran_COMPILER:FILEPATH=/opt/intel/oneapi/compiler/2022.2.1/linux/bin/intel64/ifort /path/to/njoy/source

I did this locally but will try to get the same working in our GitLab CI as it may be the only path forward to allow us to automatically process all of the files for JANIS. Otherwise we will have to temporarily remove them from being processed automatically because the job hangs and prevents the rest of the artifacts from being collected. Update: This did circumvent the bug in our GitLab CI. Not a permanent solution but at least we have a bandaid!

Sorry I couldn't be more helpful in identifying the root cause but I think this still indicates a bug. Let me know if there's anything else you can think to try to get it working with the GNU 9.4 tools!

andrewmholcomb avatar Nov 16 '22 16:11 andrewmholcomb

Hello again NJOY folks! I write to you on the one year anniversary of this issue to see if there has been any progress in figuring out the problem and fixing it? Hope all is well!

andrewmholcomb avatar Nov 15 '23 21:11 andrewmholcomb