Incorrect Simulation Results with Specific ICPC Compiler Versions
Bug Description
This is a bug Sophie, Lukas and I discovered on our university cluster.
Compiling OpenMC with certain ICPC compilers will let you install and run the code without any warnings or error messages, but will return completly wrong results.
For example, the pincell example should return a criticality of around 1.15, compiled with certain ICPC compilers the criticality during the batch will be around 3.5 and the simulation results will be
============================> RESULTS <============================
k-effective (Collision) = 0.00000 +/- 0.00000
k-effective (Track-length) = 0.00000 +/- 0.00000
k-effective (Absorption) = 0.00000 +/- 0.00000
Combined k-effective = 0.00000 +/- 0.02774
Leakage Fraction = 0.00000 +/- 0.00000
The entire output can be found in a google docs.
We checked all ICPC compilers on our cluster with the following result:
intel-compilers/2021.2.0 (C) :white_check_mark:
intel-compilers/2021.4.0 (C) :no_entry_sign:
intel-compilers/Intel 2021.6.0 :no_entry_sign:
intel-compilers/2022.1.0 (C,D) :no_entry_sign:
intel-compilers/2022.2.1 (C) :no_entry_sign:
intel-compilers/2023.0.0 (L,C) :no_entry_sign:
intel-compilers/2023.1.0 (C) :white_check_mark:
(:white_check_mark: = OpenMC returns correct results / :no_entry_sign: = OpenMC returns wrong results)
We were able to produce this bug for OpenMC 0.13.3 and the dev branch. We have not checked further OpenMC versions.
We were not able to reproduce this bug with other compilers.
Steps to Reproduce
Install OpenMC from source via
git clone https://github.com/openmc-dev/openmc.git
cd openmc
mkdir build
cd build
module load intel-compilers/2023.0.0
cmake ..
make -j8
or set the compiler with
CXX=$path_to_ICPC_compiler cmake ..
Run the pincell example.
Environment
So far we only produced this bug on our cluster and would be happy if someone could try to reproduce.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 1
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
Stepping: 4
CPU MHz: 2100.000
CPU max MHz: 3700,0000
CPU min MHz: 1000,0000
BogoMIPS: 4200.00
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 33792K
NUMA node0 CPU(s): 0-2,6-8,12-14,18-20
NUMA node1 CPU(s): 3-5,9-11,15-17,21-23
NUMA node2 CPU(s): 24-26,30-32,36-38,42-44
NUMA node3 CPU(s): 27-29,33-35,39-41,45-47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities
Update:
Actually there were some warnings during the make process:
In file included from /home/gf737457/openmc-13.3/openmc/src/finalize.cpp(18):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
const int id() const { return id_; }
^
In file included from /home/gf737457/openmc-13.3/openmc/src/finalize.cpp(18):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
const int level() const { return level_; }
^
In file included from /home/gf737457/openmc-13.3/openmc/src/initialize.cpp(26):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
const int id() const { return id_; }
^
In file included from /home/gf737457/openmc-13.3/openmc/src/initialize.cpp(26):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
const int level() const { return level_; }
^
In file included from /home/gf737457/openmc-13.3/openmc/src/output.cpp(33):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
const int id() const { return id_; }
^
In file included from /home/gf737457/openmc-13.3/openmc/src/output.cpp(33):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
const int level() const { return level_; }
^
In file included from /home/gf737457/openmc-13.3/openmc/src/plot.cpp(1):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
const int id() const { return id_; }
^
In file included from /home/gf737457/openmc-13.3/openmc/src/plot.cpp(1):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
const int level() const { return level_; }
^
In file included from /home/gf737457/openmc-13.3/openmc/src/settings.cpp(25):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(102): warning #858: type qualifier on return type is meaningless
const int id() const { return id_; }
^
In file included from /home/gf737457/openmc-13.3/openmc/src/settings.cpp(25):
/home/gf737457/openmc-13.3/openmc/include/openmc/plot.h(103): warning #858: type qualifier on return type is meaningless
const int level() const { return level_; }
^
We have tried with
CMAKE_BUILD_TYPE=Debug
which compiles the code without optimizations and the error persists.
We conclude that the problem is not due to the optimization of the compiler.
Seems like the criticality is about three times as high. And checking the statepoint we see that every source point exists three times:
It would be interesting to know if this works when you turn off threading
The criticality values do not change depending on the number of used threads. So OpenMP does not seem to be the problem.
Also compiling without OpenMP does not fix the problem.
That is very strange. Some issue with threading would have been my guess too but it sounds like that's ruled out. Given that the most recent compiler version works, it may have been some weird compiler bug that has since been fixed :man_shrugging:
Seems like the criticality is about three times as high. And checking the statepoint we see that every source point exists three times:
It would be nice to track down what's wrong here, or at least the piece of code that leads to erroneous compiler behavior. I fear this is due to undefined behavior in OpenMC.
How do fixed source calcs look? You could run a fixed source calc that runs both with create_fission_neutrons true and false. Seems like the problem might lie there.
Sorry for the late reply, and thanks for the good idea.
Since create_fission_neutrons only affects the fixed_source simulations, I tried running the pincell example as a fixed source simulation and used diff to compare the created tallies.out files. With create_fission_neutrons=false there is no difference.
It seems like the bug is somewhere where the created particles are written to the secondary / fission bank.