EGSnrc icon indicating copy to clipboard operation
EGSnrc copied to clipboard

Out of bounds read in egs_dose_scoring for region index = -1

Open mxxo opened this issue 3 years ago • 1 comments

I had some sporadic crashes (up to 10% of total jobs crashing with SIGBUS errors) during some calculations on the supercomputer. Building egs++ with address sanitizer found a potential out of bounds read in egs_dose_scoring while scoring region doses. I am not sure if this is the root cause of the crashes, but it was flagged while I was debugging.

Reproducer

I built egs++ and tutor7pp from the develop branch with address sanitizer (added -fsanitize=address -static-libasan to opt in the specs/egspp_* configuration file), and added a region dose scoring object to the sample input file test1.egsinp:

:start ausgab object definition:
    :start ausgab object:
        library = egs_dose_scoring
        name = doses
    :stop ausgab object:
:stop ausgab object definition:

Running tutor7pp -i test1_doses -p tutor_data led to the following crash report (truncated):

Running 1000000 histories

  Batch             CPU time        Result   Uncertainty(%)
==========================================================
      1=================================================================
==3518==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000dcc at pc 0x7f3a123ccdd3 bp 0x7ffeb0fca590 sp 0x7ffeb0fca580
READ of size 4 at 0x602000000dcc thread T0
    #0 0x7f3a123ccdd2 in EGS_DoseScoring::processEvent(EGS_Application::AusgabCall) /home/max/projects/EGSnrc/HEN_HOUSE/egs++/ausgab_objects/egs_dose_scoring/egs_dose_scoring.h:218
    #1 0x7f3a171c01c0 in EGS_Application::userScoring(int, int) /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_application.cpp:120
    #2 0x561f58b2241c in egs_ausgab_ /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_advanced_application.cpp:1285
    #3 0x561f58b7426e in photon_ (/home/max/projects/EGSnrc/egs_home/bin/linux/tutor7pp+0x17426e)
    #4 0x561f58ba973d in egs_shower_ /home/max/projects/EGSnrc/HEN_HOUSE/interface/egs_interface2.c:96
    #5 0x561f58b1edca in EGS_AdvancedApplication::shower() /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_advanced_application.cpp:873
    #6 0x7f3a171bfabd in EGS_Application::simulateSingleShower() /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_application.cpp:945
    #7 0x7f3a171beccb in EGS_Application::runSimulation() /home/max/projects/EGSnrc/HEN_HOUSE/egs++/egs_application.cpp:894
    #8 0x561f58a120cd in main /home/max/projects/EGSnrc/HEN_HOUSE/user_codes/tutor7pp/tutor7pp.cpp:741
    #9 0x7f3a15964b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #10 0x561f58a12589 in _start (/home/max/projects/EGSnrc/egs_home/bin/linux/tutor7pp+0x12589)

pointing to line 218 in egs_dose_scoring.h: https://github.com/nrc-cnrc/EGSnrc/blob/a6fc389c6465c949645380976f38a013a639759c/HEN_HOUSE/egs%2B%2B/ausgab_objects/egs_dose_scoring/egs_dose_scoring.h#L217-L221

Since ir could equal -1, I thought there was a potential out of bounds read. To confirm this, I changed the code to use checked array access (d_reg_index.at(ir)) and I got the following crash:

Running 1000000 histories

  Batch             CPU time        Result   Uncertainty(%)
==========================================================
      1terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 1)
Aborted (core dumped)

where the n value is equal to -1 converted to a 64-bit unsigned integer.

After changing the code to check ir >= 0 the simulation ran to completion:

--- a/HEN_HOUSE/egs++/ausgab_objects/egs_dose_scoring/egs_dose_scoring.h
+++ b/HEN_HOUSE/egs++/ausgab_objects/egs_dose_scoring/egs_dose_scoring.h
@@ -214,7 +214,7 @@ public:
         }
 
         /*** Check if scoring in current region ***/
-        if (dose) {
+        if (ir >= 0 && dose) {
             if (d_reg_index[ir]<0) {
                 return 0;
             }
@@ -247,7 +247,7 @@ public:
         }
 
         /*** Check if scoring in current region ***/
-        if (dose) {
+        if (ir >= 0 && dose) {
             if (d_reg_index[ir]<0) {
                 return 0;
             }

Running 1000000 histories

  Batch             CPU time        Result   Uncertainty(%)
==========================================================
      1                3.36      0.0120765           2.00
      2                6.38       0.012468           1.40
      3                9.43      0.0124984           1.14
      4               12.46      0.0125574           0.98


*** Reached the requested uncertainty of 1%
    => terminating simulation.

  finishBatch() loop termination


Finished simulation

Total cpu time for this run:            12.46 (sec.) 0.0035(hours)
Histories per hour:                     1.1557e+08    
Number of random numbers used:          196403671     
Number of electron CH steps:            7.89363e+06   
Number of all electron steps:           1.06595e+07   


 last case = 400000 Etot = 8e+06

mxxo avatar Jan 24 '22 21:01 mxxo

Great find @mxxo 🏅

Thanks for taking the time to pen down a detailed account, and solution.

ftessier avatar Jan 24 '22 21:01 ftessier