EGSnrc
EGSnrc copied to clipboard
Out of bounds read in egs_dose_scoring for region index = -1
I had some sporadic crashes (up to 10% of total jobs crashing with SIGBUS
errors) during some calculations on the supercomputer. Building egs++ with address sanitizer found a potential out of bounds read in egs_dose_scoring
while scoring region doses. I am not sure if this is the root cause of the crashes, but it was flagged while I was debugging.
Reproducer
I built egs++ and tutor7pp from the develop
branch with address sanitizer (added -fsanitize=address -static-libasan
to opt
in the specs/egspp_* configuration file), and added a region dose scoring object to the sample input file test1.egsinp
:
:start ausgab object definition:
:start ausgab object:
library = egs_dose_scoring
name = doses
:stop ausgab object:
:stop ausgab object definition:
Running tutor7pp -i test1_doses -p tutor_data
led to the following crash report (truncated):
Running 1000000 histories
Batch CPU time Result Uncertainty(%)
==========================================================
1=================================================================
==3518==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000dcc at pc 0x7f3a123ccdd3 bp 0x7ffeb0fca590 sp 0x7ffeb0fca580
READ of size 4 at 0x602000000dcc thread T0
#0 0x7f3a123ccdd2 in EGS_DoseScoring::processEvent(EGS_Application::AusgabCall) /home/max/projects/EGSnrc/HEN_HOUSE/egs++/ausgab_objects/egs_dose_scoring/egs_dose_scoring.h:218
#1 0x7f3a171c01c0 in EGS_Application::userScoring(int, int) /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_application.cpp:120
#2 0x561f58b2241c in egs_ausgab_ /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_advanced_application.cpp:1285
#3 0x561f58b7426e in photon_ (/home/max/projects/EGSnrc/egs_home/bin/linux/tutor7pp+0x17426e)
#4 0x561f58ba973d in egs_shower_ /home/max/projects/EGSnrc/HEN_HOUSE/interface/egs_interface2.c:96
#5 0x561f58b1edca in EGS_AdvancedApplication::shower() /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_advanced_application.cpp:873
#6 0x7f3a171bfabd in EGS_Application::simulateSingleShower() /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_application.cpp:945
#7 0x7f3a171beccb in EGS_Application::runSimulation() /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_application.cpp:894
#8 0x561f58a120cd in main /home/max/projects/EGSnrc/HEN_HOUSE/user_codes/tutor7pp/tutor7pp.cpp:741
#9 0x7f3a15964b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#10 0x561f58a12589 in _start (/home/max/projects/EGSnrc/egs_home/bin/linux/tutor7pp+0x12589)
pointing to line 218 in egs_dose_scoring.h: https://github.com/nrc-cnrc/EGSnrc/blob/a6fc389c6465c949645380976f38a013a639759c/HEN_HOUSE/egs%2B%2B/ausgab_objects/egs_dose_scoring/egs_dose_scoring.h#L217-L221
Since ir
could equal -1, I thought there was a potential out of bounds read. To confirm this, I changed the code to use checked array access (d_reg_index.at(ir)
) and I got the following crash:
Running 1000000 histories
Batch CPU time Result Uncertainty(%)
==========================================================
1terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 1)
Aborted (core dumped)
where the n value is equal to -1 converted to a 64-bit unsigned integer.
After changing the code to check ir >= 0
the simulation ran to completion:
--- a/HEN_HOUSE/egs++/ausgab_objects/egs_dose_scoring/egs_dose_scoring.h
+++ b/HEN_HOUSE/egs++/ausgab_objects/egs_dose_scoring/egs_dose_scoring.h
@@ -214,7 +214,7 @@ public:
}
/*** Check if scoring in current region ***/
- if (dose) {
+ if (ir >= 0 && dose) {
if (d_reg_index[ir]<0) {
return 0;
}
@@ -247,7 +247,7 @@ public:
}
/*** Check if scoring in current region ***/
- if (dose) {
+ if (ir >= 0 && dose) {
if (d_reg_index[ir]<0) {
return 0;
}
Running 1000000 histories
Batch CPU time Result Uncertainty(%)
==========================================================
1 3.36 0.0120765 2.00
2 6.38 0.012468 1.40
3 9.43 0.0124984 1.14
4 12.46 0.0125574 0.98
*** Reached the requested uncertainty of 1%
=> terminating simulation.
finishBatch() loop termination
Finished simulation
Total cpu time for this run: 12.46 (sec.) 0.0035(hours)
Histories per hour: 1.1557e+08
Number of random numbers used: 196403671
Number of electron CH steps: 7.89363e+06
Number of all electron steps: 1.06595e+07
last case = 400000 Etot = 8e+06
Great find @mxxo 🏅
Thanks for taking the time to pen down a detailed account, and solution.