Gate
Gate copied to clipboard
Valgrind detects use of uninitialized values
Summary
I ran valgrind
on GATE v9.1 and got a lot of Conditional jump or move depends on uninitialised value(s) messages. I checked the first and fourth one and believe that valgrind is correct there, see below for more details. The complete log is attached here: valg2-1.txt valg2-2.txt valg2-3.txt (split into three files due to file size limits).
If the warnings are actually correct, GATE could behave non-deterministically even if the random number generator's seed is fixed. Similar issues are known in ROOT, but the developers seem to have investigated the warnings there and provide a suppression file for valgrind. I used this suppression file, so the findings are caused by Gate/Geant4.
Has this been investigated? If yes, with which result?
Details regarding the first finding
==59== Conditional jump or move depends on uninitialised value(s)
==59== at 0xD1F54A: GateCrystalSD::PrepareCreatorAttachment(GateVVolume*) (GateCrystalSD.cc:245)
==59== by 0x11FF5D9: GateVVolume::AttachCrystalSD() (GateVVolume.cc:388)
==59== by 0x1209FA5: GateVolumeMessenger::SetNewValue(G4UIcommand*, G4String) (GateVolumeMessenger.cc:83)
==59== by 0x104CB659: G4UIcommand::DoIt(G4String) (G4UIcommand.cc:262)
==59== by 0x104E06E4: G4UImanager::ApplyCommand(char const*) (G4UImanager.cc:584)
==59== by 0x104DFE71: G4UImanager::ApplyCommand(G4String const&) (G4UImanager.cc:481)
==59== by 0x104C6561: G4UIbatch::ExecCommand(G4String const&) (G4UIbatch.cc:182)
==59== by 0x104C6790: G4UIbatch::SessionStart() (G4UIbatch.cc:230)
==59== by 0x104DEAFD: G4UImanager::ExecuteMacroFile(char const*) (G4UImanager.cc:308)
==59== by 0x104DAA2A: G4UIcontrolMessenger::SetNewValue(G4UIcommand*, G4String) (G4UIcontrolMessenger.cc:418)
==59== by 0x104CB659: G4UIcommand::DoIt(G4String) (G4UIcommand.cc:262)
==59== by 0x104E06E4: G4UImanager::ApplyCommand(char const*) (G4UImanager.cc:584)
==59== Uninitialised value was created by a heap allocation
==59== at 0x4C36833: operator new(unsigned long) (vg_replace_malloc.c:417)
==59== by 0x11314ED: GateDetectorConstruction::GateDetectorConstruction() (GateDetectorConstruction.cc:110)
==59== by 0xC7E67A: main (Gate.cc:283)
I searched for usages of GateCrystalSD::m_systemList
in the Gate project and found that this variable is only written to in the said location (where it is initialized if it is a nullptr
, hence the conditional jump depending on uninitialized value) and in GateCrystalSD::Clone()
.
Details regarding the fourth finding
==59== Conditional jump or move depends on uninitialised value(s)
==59== at 0xF1D2EC3: G4VParticleChange::CheckSecondary(G4Track&) (G4VParticleChange.cc:449)
==59== by 0xF1D1D7B: G4VParticleChange::AddSecondary(G4Track*) (G4VParticleChange.cc:149)
==59== by 0xF1C442A: G4ParticleChange::AddSecondary(G4Track*) (G4ParticleChange.cc:190)
==59== by 0xC701C9A: G4HadronicProcess::FillResult(G4HadFinalState*, G4Track const&) (G4HadronicProcess.cc:468)
==59== by 0xC70101A: G4HadronicProcess::PostStepDoIt(G4Track const&, G4Step const&) (G4HadronicProcess.cc:350)
==59== by 0xB135E76: G4SteppingManager::InvokePSDIP(unsigned long) (G4SteppingManager2.cc:590)
==59== by 0xB135CFB: G4SteppingManager::InvokePostStepDoItProcs() (G4SteppingManager2.cc:558)
==59== by 0xB131FA1: G4SteppingManager::Stepping() (G4SteppingManager.cc:220)
==59== by 0xB13D437: G4TrackingManager::ProcessOneTrack(G4Track*) (G4TrackingManager.cc:138)
==59== by 0xAE703D0: G4EventManager::DoProcessing(G4Event*) (G4EventManager.cc:173)
==59== by 0xAE70BF9: G4EventManager::ProcessOneEvent(G4Event*) (G4EventManager.cc:335)
==59== by 0xABA7394: G4RunManager::ProcessOneEvent(int) (G4RunManager.cc:510)
==59== Uninitialised value was created by a heap allocation
==59== at 0x4C36833: operator new(unsigned long) (vg_replace_malloc.c:417)
==59== by 0x12BA4F7: GateSourceMgr::AddSource(std::vector<G4String, std::allocator<G4String> >) (GateSourceMgr.cc:194)
==59== by 0x12BE1EB: GateSourceMgrMessenger::SetNewValue(G4UIcommand*, G4String) (GateSourceMgrMessenger.cc:82)
==59== by 0x104CB659: G4UIcommand::DoIt(G4String) (G4UIcommand.cc:262)
==59== by 0x104E06E4: G4UImanager::ApplyCommand(char const*) (G4UImanager.cc:584)
==59== by 0x104DFE71: G4UImanager::ApplyCommand(G4String const&) (G4UImanager.cc:481)
==59== by 0x104C6561: G4UIbatch::ExecCommand(G4String const&) (G4UIbatch.cc:182)
==59== by 0x104C6790: G4UIbatch::SessionStart() (G4UIbatch.cc:230)
==59== by 0x104DEAFD: G4UImanager::ExecuteMacroFile(char const*) (G4UImanager.cc:308)
==59== by 0x104DAA2A: G4UIcontrolMessenger::SetNewValue(G4UIcommand*, G4String) (G4UIcontrolMessenger.cc:418)
==59== by 0x104CB659: G4UIcommand::DoIt(G4String) (G4UIcommand.cc:262)
==59== by 0x104E06E4: G4UImanager::ApplyCommand(char const*) (G4UImanager.cc:584)
The said line GateSourceMgr.cc:194
reads
source = new GateSourcePencilBeam( sourceName );
so the uninitialized variable is a member of GateSourcePencilBeam
. I noticed that GateSourcePencilBeam::mparticle_time
is never initialized, but used in GateSourcePencilBeam::GenerateVertex
to construct a G4PrimaryVertex
with this value as G4PrimaryVertex::T0
. I ran it several times and always found the same value in the location of mparticle_time
in the beginning of the constructor (the value was 29958.
which I relate to the entry 29.958
in several energy tables of cross-section tables, e.g. here). There may be other uninitialized members of GateSourcePencilBeam
.
More details on my setup
I compiled the code in a container according to Gate/source/docker/Generate-9.1.sh
, i.e. numbers of code lines refer to the following versions:
- Geant4 tag v10.7.1 commit 460fe6dfb2fc3b2e509ee3d8ec5aef963a464b05
- GATE tag v9.1 commit 86083caae3a4f41903b3d1e3c3d5571e807e041e
- ROOT tag v6-19-02 commit e1e22198f55729f46f81f5274022e287e405ce57
I used podman
on Ubuntu 21.10 instead of docker
, which should not make any difference.
The following modifications were applied to DockerFileGeant
and DockerFileGate
:
- Add cmake arguments
-DCMAKE_BUILD_TYPE=Debug
(so the valgrind output is more readable) - Remove the make argument
-j
(single-threaded build, to minimize RAM usage during build, should not make any difference). - The tag of the Geant4 container (that I built with these changes) must be provided in the
FROM
field of the Gate Dockerfile.
Valgrind was run with the argument --suppressions=$ROOTSYS/etc/valgrind-root.supp
so that some ROOT-related, probably well-investigated, warnings were suppressed.
Hi @maxaehle, very interesting thanks ! As these findings could lead to a lot of changes, I suggest to start step by step, with one single change proposal, for example in the first file that you identify (GateCrystalSD). Do you think you can propose a PR ? thanks
Here is some more information on the macro files etc. that I used to run GATE in valgrind: I checked out pct-bug-geometry
and executed the script pct-bug-geometry/scripts/run_tests.sh
, in which the line that calls Gate
is replaced by e.g.
valgrind --leak-check=full --log-file=valg --track-origins=yes --suppressions=$ROOTSYS/etc/valgrind-root.supp Gate Main.mac
The pct-bug-geometry
setup was constructed as a minimal example that exhibits non-deterministic behavior even if the seed of the random number generator is fixed: run_tests.sh
prints the md5 hash of some output file, and when it is executed a lot of times, a few different hashes show up. If you can reproduce this observation, please share this information in the chat. To debug the non-deterministic behavior, I ran valgrind.
I think that the non-deterministic behavior in the pct-bug-geometry
setup is fixed once all the uninitialized variable problems are fixed, due to the following analysis:
It is possible to make the functions malloc
and new
initialize the memory that they allocate, e.g. in the following way:
- Checkout zeromalloc, cd into it, checkout commit ea796a2, run
make
. This creates a shared librarylibzeromalloc.so
. - Set the environment variable
LD_PRELOAD
to the absolute path oflibzeromalloc.so
when runningGate
, e.g. instead ofGate Something.mac
call
LD_PRELOAD=/somepath/libzeromalloc.so Gate Something.mac
Some explanation (tl;dr): In the executable file Gate
, the function malloc
is referenced but not defined, as you can verify with the command nm `which Gate` | grep malloc
, the output of which contains the line U malloc@@GLIBC_2.2.5
. When Gate
is started, the dynamic linker loads a couple of dynamic libraries, which you can list with ldd `which Gate`
. Very likely then something like libc.so.6
defines the function malloc
. Now, letting LD_PRELOAD
point to libzeromalloc.so
instructs the dynamic linker to load this one first, and as seen from
$ nm /somepath/libzeromalloc.so
...
U __libc_malloc@@GLIBC_2.2.5
...
0000000000000adb T malloc
...
it defines malloc
, and references __libc_malloc
(which likely is defined in the libc.so
). Specifically, as coded in zeromalloc/malloc.c
, the former uses the latter to allocate the memory, and then initializes it.
There is no guarantee that this fixes all situations where an uninitialized variable could be used, e.g. realloc
and __libc_malloc
still give non-initialized memory and local variables are not allocated with malloc
/new
anyway.
First observation and conclusion:
When running Gate with this configuration and the zero-initializing malloc
, new
, only a unique md5 hash is printed (I checked with 120 runs; without zeromalloc, 9 out of 120 runs gave a different hash on my system). Thus it seems that fixing all the uninitialized variables stops the non-deterministic behavior of the pct-bug-geometry
setup.
Second observation:
The value that is used to initialize each byte is specified by the constant zero
in zeromalloc/malloc.c
. If it is set to 0x01
instead of 0x00
, and the library is again compiled and LD_PRELOAD
'ed, Gate runs into a segmentation fault with the pct-bug-geometry
setup. This segfault is for the same reason as reported here: Some portion of code in GateCrystalSD::PrepareCreatorAttachment
assumes that either m_systemList
is nullptr
or *m_systemList
is initialized, which is both not true because the initial value of the uninitialized variable m_systemList
happens to be 0x0101010101010101
. As the C/C++ standards leave the initial value of the allocated memory unspecified, the segfault could (theoretically) happen randomly to any user. The segfault has probably not been observed/reported yet because m_systemList
not being nullptr
is apparantly unlikely, and not reproducable.
Strategies to fix the other valgrind findings: The following strategies to reduce the usage of uninitialized variables come into my mind:
- One could have a look at all declared variables and see whether they are initialized before being used. Note that the initialization can be at a different location in the code, e.g. in the constructor,
operator=
or some setter methods that are invoked early enough due to the program logic. - The clang-tidy check
cppcoreguidelines-pro-type-member-init
performs a similar static code analysis, looking at the constructor's initializer list and body (but not further, e.g. into functions called in the body). As the previous strategy and any kind of static analysis, it is not able to recognize if a variable which is left uninitialized in the constructor is initialized later on by e.g. some setter method, before being used. - Alternatively one can use valgrind to run GATE and actively identify the usage of uninitialized variables for a particular setup like the
pct-bug-geometry
. - Whenever an uninitialized variable has been identified by either strategy, in a second step the proper initial value must be determined. Usually I expect it to be something like
0
ornullptr
. E.g. in #483 it was clear to me, but in #484 it wasn't.
More details about our procedure, checks etc. can be found here.
Whoo ! The very detailed discussion on the provided gitlab link is just amazing, thank you for all the investigations! I admit I am far to understand most of the provided analysis, but for sure it is useful. We should probably discuss a bit more to make me better understand. As already mentioned, the whole digitizer part is currently being investigated to be entirely rewritten. Moreover, your expertise will also be very useful for the gam-gate project (still a temporary name, maybe will be something like "Gate X", for version ten, ;)