Gate icon indicating copy to clipboard operation
Gate copied to clipboard

Valgrind detects use of uninitialized values

Open maxaehle opened this issue 3 years ago • 4 comments

Summary

I ran valgrind on GATE v9.1 and got a lot of Conditional jump or move depends on uninitialised value(s) messages. I checked the first and fourth one and believe that valgrind is correct there, see below for more details. The complete log is attached here: valg2-1.txt valg2-2.txt valg2-3.txt (split into three files due to file size limits).

If the warnings are actually correct, GATE could behave non-deterministically even if the random number generator's seed is fixed. Similar issues are known in ROOT, but the developers seem to have investigated the warnings there and provide a suppression file for valgrind. I used this suppression file, so the findings are caused by Gate/Geant4.

Has this been investigated? If yes, with which result?

Details regarding the first finding

==59== Conditional jump or move depends on uninitialised value(s)
==59==    at 0xD1F54A: GateCrystalSD::PrepareCreatorAttachment(GateVVolume*) (GateCrystalSD.cc:245)
==59==    by 0x11FF5D9: GateVVolume::AttachCrystalSD() (GateVVolume.cc:388)
==59==    by 0x1209FA5: GateVolumeMessenger::SetNewValue(G4UIcommand*, G4String) (GateVolumeMessenger.cc:83)
==59==    by 0x104CB659: G4UIcommand::DoIt(G4String) (G4UIcommand.cc:262)
==59==    by 0x104E06E4: G4UImanager::ApplyCommand(char const*) (G4UImanager.cc:584)
==59==    by 0x104DFE71: G4UImanager::ApplyCommand(G4String const&) (G4UImanager.cc:481)
==59==    by 0x104C6561: G4UIbatch::ExecCommand(G4String const&) (G4UIbatch.cc:182)
==59==    by 0x104C6790: G4UIbatch::SessionStart() (G4UIbatch.cc:230)
==59==    by 0x104DEAFD: G4UImanager::ExecuteMacroFile(char const*) (G4UImanager.cc:308)
==59==    by 0x104DAA2A: G4UIcontrolMessenger::SetNewValue(G4UIcommand*, G4String) (G4UIcontrolMessenger.cc:418)
==59==    by 0x104CB659: G4UIcommand::DoIt(G4String) (G4UIcommand.cc:262)
==59==    by 0x104E06E4: G4UImanager::ApplyCommand(char const*) (G4UImanager.cc:584)
==59==  Uninitialised value was created by a heap allocation
==59==    at 0x4C36833: operator new(unsigned long) (vg_replace_malloc.c:417)
==59==    by 0x11314ED: GateDetectorConstruction::GateDetectorConstruction() (GateDetectorConstruction.cc:110)
==59==    by 0xC7E67A: main (Gate.cc:283)

I searched for usages of GateCrystalSD::m_systemList in the Gate project and found that this variable is only written to in the said location (where it is initialized if it is a nullptr, hence the conditional jump depending on uninitialized value) and in GateCrystalSD::Clone().

Details regarding the fourth finding

==59== Conditional jump or move depends on uninitialised value(s)
==59==    at 0xF1D2EC3: G4VParticleChange::CheckSecondary(G4Track&) (G4VParticleChange.cc:449)
==59==    by 0xF1D1D7B: G4VParticleChange::AddSecondary(G4Track*) (G4VParticleChange.cc:149)
==59==    by 0xF1C442A: G4ParticleChange::AddSecondary(G4Track*) (G4ParticleChange.cc:190)
==59==    by 0xC701C9A: G4HadronicProcess::FillResult(G4HadFinalState*, G4Track const&) (G4HadronicProcess.cc:468)
==59==    by 0xC70101A: G4HadronicProcess::PostStepDoIt(G4Track const&, G4Step const&) (G4HadronicProcess.cc:350)
==59==    by 0xB135E76: G4SteppingManager::InvokePSDIP(unsigned long) (G4SteppingManager2.cc:590)
==59==    by 0xB135CFB: G4SteppingManager::InvokePostStepDoItProcs() (G4SteppingManager2.cc:558)
==59==    by 0xB131FA1: G4SteppingManager::Stepping() (G4SteppingManager.cc:220)
==59==    by 0xB13D437: G4TrackingManager::ProcessOneTrack(G4Track*) (G4TrackingManager.cc:138)
==59==    by 0xAE703D0: G4EventManager::DoProcessing(G4Event*) (G4EventManager.cc:173)
==59==    by 0xAE70BF9: G4EventManager::ProcessOneEvent(G4Event*) (G4EventManager.cc:335)
==59==    by 0xABA7394: G4RunManager::ProcessOneEvent(int) (G4RunManager.cc:510)
==59==  Uninitialised value was created by a heap allocation
==59==    at 0x4C36833: operator new(unsigned long) (vg_replace_malloc.c:417)
==59==    by 0x12BA4F7: GateSourceMgr::AddSource(std::vector<G4String, std::allocator<G4String> >) (GateSourceMgr.cc:194)
==59==    by 0x12BE1EB: GateSourceMgrMessenger::SetNewValue(G4UIcommand*, G4String) (GateSourceMgrMessenger.cc:82)
==59==    by 0x104CB659: G4UIcommand::DoIt(G4String) (G4UIcommand.cc:262)
==59==    by 0x104E06E4: G4UImanager::ApplyCommand(char const*) (G4UImanager.cc:584)
==59==    by 0x104DFE71: G4UImanager::ApplyCommand(G4String const&) (G4UImanager.cc:481)
==59==    by 0x104C6561: G4UIbatch::ExecCommand(G4String const&) (G4UIbatch.cc:182)
==59==    by 0x104C6790: G4UIbatch::SessionStart() (G4UIbatch.cc:230)
==59==    by 0x104DEAFD: G4UImanager::ExecuteMacroFile(char const*) (G4UImanager.cc:308)
==59==    by 0x104DAA2A: G4UIcontrolMessenger::SetNewValue(G4UIcommand*, G4String) (G4UIcontrolMessenger.cc:418)
==59==    by 0x104CB659: G4UIcommand::DoIt(G4String) (G4UIcommand.cc:262)
==59==    by 0x104E06E4: G4UImanager::ApplyCommand(char const*) (G4UImanager.cc:584)

The said line GateSourceMgr.cc:194 reads

source = new GateSourcePencilBeam( sourceName );

so the uninitialized variable is a member of GateSourcePencilBeam. I noticed that GateSourcePencilBeam::mparticle_time is never initialized, but used in GateSourcePencilBeam::GenerateVertex to construct a G4PrimaryVertex with this value as G4PrimaryVertex::T0. I ran it several times and always found the same value in the location of mparticle_time in the beginning of the constructor (the value was 29958. which I relate to the entry 29.958 in several energy tables of cross-section tables, e.g. here). There may be other uninitialized members of GateSourcePencilBeam.

More details on my setup

I compiled the code in a container according to Gate/source/docker/Generate-9.1.sh, i.e. numbers of code lines refer to the following versions:

  • Geant4 tag v10.7.1 commit 460fe6dfb2fc3b2e509ee3d8ec5aef963a464b05
  • GATE tag v9.1 commit 86083caae3a4f41903b3d1e3c3d5571e807e041e
  • ROOT tag v6-19-02 commit e1e22198f55729f46f81f5274022e287e405ce57

I used podman on Ubuntu 21.10 instead of docker, which should not make any difference.

The following modifications were applied to DockerFileGeant and DockerFileGate:

  • Add cmake arguments -DCMAKE_BUILD_TYPE=Debug (so the valgrind output is more readable)
  • Remove the make argument -j (single-threaded build, to minimize RAM usage during build, should not make any difference).
  • The tag of the Geant4 container (that I built with these changes) must be provided in the FROM field of the Gate Dockerfile.

Valgrind was run with the argument --suppressions=$ROOTSYS/etc/valgrind-root.supp so that some ROOT-related, probably well-investigated, warnings were suppressed.

maxaehle avatar Jan 10 '22 12:01 maxaehle

Hi @maxaehle, very interesting thanks ! As these findings could lead to a lot of changes, I suggest to start step by step, with one single change proposal, for example in the first file that you identify (GateCrystalSD). Do you think you can propose a PR ? thanks

dsarrut avatar Jan 27 '22 07:01 dsarrut

Here is some more information on the macro files etc. that I used to run GATE in valgrind: I checked out pct-bug-geometry and executed the script pct-bug-geometry/scripts/run_tests.sh, in which the line that calls Gate is replaced by e.g.

valgrind --leak-check=full --log-file=valg --track-origins=yes --suppressions=$ROOTSYS/etc/valgrind-root.supp Gate Main.mac

The pct-bug-geometry setup was constructed as a minimal example that exhibits non-deterministic behavior even if the seed of the random number generator is fixed: run_tests.sh prints the md5 hash of some output file, and when it is executed a lot of times, a few different hashes show up. If you can reproduce this observation, please share this information in the chat. To debug the non-deterministic behavior, I ran valgrind.

I think that the non-deterministic behavior in the pct-bug-geometry setup is fixed once all the uninitialized variable problems are fixed, due to the following analysis:

It is possible to make the functions malloc and new initialize the memory that they allocate, e.g. in the following way:

  • Checkout zeromalloc, cd into it, checkout commit ea796a2, run make. This creates a shared library libzeromalloc.so.
  • Set the environment variable LD_PRELOAD to the absolute path of libzeromalloc.so when running Gate, e.g. instead of Gate Something.mac call
LD_PRELOAD=/somepath/libzeromalloc.so Gate Something.mac

Some explanation (tl;dr): In the executable file Gate, the function malloc is referenced but not defined, as you can verify with the command nm `which Gate` | grep malloc, the output of which contains the line U malloc@@GLIBC_2.2.5. When Gate is started, the dynamic linker loads a couple of dynamic libraries, which you can list with ldd `which Gate` . Very likely then something like libc.so.6 defines the function malloc. Now, letting LD_PRELOAD point to libzeromalloc.so instructs the dynamic linker to load this one first, and as seen from

$ nm /somepath/libzeromalloc.so 
...
                 U __libc_malloc@@GLIBC_2.2.5
...
0000000000000adb T malloc
...

it defines malloc, and references __libc_malloc (which likely is defined in the libc.so). Specifically, as coded in zeromalloc/malloc.c, the former uses the latter to allocate the memory, and then initializes it.

There is no guarantee that this fixes all situations where an uninitialized variable could be used, e.g. realloc and __libc_malloc still give non-initialized memory and local variables are not allocated with malloc/new anyway.

First observation and conclusion: When running Gate with this configuration and the zero-initializing malloc, new, only a unique md5 hash is printed (I checked with 120 runs; without zeromalloc, 9 out of 120 runs gave a different hash on my system). Thus it seems that fixing all the uninitialized variables stops the non-deterministic behavior of the pct-bug-geometry setup.

Second observation: The value that is used to initialize each byte is specified by the constant zero in zeromalloc/malloc.c. If it is set to 0x01 instead of 0x00, and the library is again compiled and LD_PRELOAD'ed, Gate runs into a segmentation fault with the pct-bug-geometry setup. This segfault is for the same reason as reported here: Some portion of code in GateCrystalSD::PrepareCreatorAttachment assumes that either m_systemList is nullptr or *m_systemList is initialized, which is both not true because the initial value of the uninitialized variable m_systemList happens to be 0x0101010101010101. As the C/C++ standards leave the initial value of the allocated memory unspecified, the segfault could (theoretically) happen randomly to any user. The segfault has probably not been observed/reported yet because m_systemList not being nullptr is apparantly unlikely, and not reproducable.

Strategies to fix the other valgrind findings: The following strategies to reduce the usage of uninitialized variables come into my mind:

  • One could have a look at all declared variables and see whether they are initialized before being used. Note that the initialization can be at a different location in the code, e.g. in the constructor, operator= or some setter methods that are invoked early enough due to the program logic.
  • The clang-tidy check cppcoreguidelines-pro-type-member-init performs a similar static code analysis, looking at the constructor's initializer list and body (but not further, e.g. into functions called in the body). As the previous strategy and any kind of static analysis, it is not able to recognize if a variable which is left uninitialized in the constructor is initialized later on by e.g. some setter method, before being used.
  • Alternatively one can use valgrind to run GATE and actively identify the usage of uninitialized variables for a particular setup like the pct-bug-geometry.
  • Whenever an uninitialized variable has been identified by either strategy, in a second step the proper initial value must be determined. Usually I expect it to be something like 0 or nullptr. E.g. in #483 it was clear to me, but in #484 it wasn't.

maxaehle avatar Feb 08 '22 11:02 maxaehle

More details about our procedure, checks etc. can be found here.

maxaehle avatar Feb 18 '22 10:02 maxaehle

Whoo ! The very detailed discussion on the provided gitlab link is just amazing, thank you for all the investigations! I admit I am far to understand most of the provided analysis, but for sure it is useful. We should probably discuss a bit more to make me better understand. As already mentioned, the whole digitizer part is currently being investigated to be entirely rewritten. Moreover, your expertise will also be very useful for the gam-gate project (still a temporary name, maybe will be something like "Gate X", for version ten, ;)

dsarrut avatar Feb 18 '22 13:02 dsarrut