acts icon indicating copy to clipboard operation
acts copied to clipboard

Segfault for ODD+Pythia8+Geant4

Open andiwand opened this issue 3 years ago • 10 comments

@benjaminhuth pointed out that ODD+Pythia8+Geant4 will segfault in full chain

I just verified this. See attached files for more information.

segfault.txt full_chain.txt

andiwand avatar Oct 06 '22 10:10 andiwand

Hm, could be the two interfere?

paulgessinger avatar Oct 06 '22 10:10 paulgessinger

Is this executed in single thread ?

asalzburger avatar Oct 06 '22 10:10 asalzburger

No good numThreads=-1 - this would need Geant4MT.

asalzburger avatar Oct 06 '22 10:10 asalzburger

For me this crashes with one thread as well

benjaminhuth avatar Oct 06 '22 10:10 benjaminhuth

Is that with the same error / segfault ?

asalzburger avatar Oct 06 '22 10:10 asalzburger

Hmm it looks a bit different to be honest, but couldn't check in detail for no. I attache my gdb backtrace, maybe this gives a hint.

backtrace.txt

benjaminhuth avatar Oct 06 '22 11:10 benjaminhuth

@andiwand could you maybe also run it in gdb to see if it is the same fault (its not entirely clear to me from the error message)

benjaminhuth avatar Oct 06 '22 11:10 benjaminhuth

Hm, could be the two interfere?

I think so yes. Pythia8 only works and Geant4 only works.

@andiwand could you maybe also run it in gdb to see if it is the same fault (its not entirely clear to me from the error message)

sure will do

andiwand avatar Oct 06 '22 12:10 andiwand

I can confirm I just encountered the same issue using the geant4.py example : ### CAUGHT SIGNAL: 11 ### address: 0x7f9417827000, signal = SIGSEGV, value = 11, description = segmentation violation. Address not mapped to object. I tried updating to the latest G4 version (11.0.3) but it didn't change anything

Corentin-Allaire avatar Oct 06 '22 12:10 Corentin-Allaire

As we discussed during today's meeting I tried to replace the ODD by the GDLM implementation of Alice_v3 and it ran through with just a few warning. So the issue is either with the ODD itself or the DDG4DetectorConstruction...

Corentin-Allaire avatar Oct 11 '22 16:10 Corentin-Allaire

This issue/PR has been automatically marked as stale because it has not had recent activity. The stale label will be removed if any interaction occurs.

stale[bot] avatar Nov 12 '22 05:11 stale[bot]

I have had an other look at this and I just notice something. If I start removing the support from the ODD xml the segfault happen much later so maybe there is something bad with the support surface definition ?

Corentin-Allaire avatar Nov 23 '22 17:11 Corentin-Allaire

This issue/PR has been automatically marked as stale because it has not had recent activity. The stale label will be removed if any interaction occurs.

stale[bot] avatar Dec 24 '22 02:12 stale[bot]

I was checking back this issue out of curiosity and it is still there. Maybe we should try to investigate this again at some point ?

Corentin-Allaire avatar Jan 19 '23 09:01 Corentin-Allaire

For sure this is something that we need to fix. ~~Do we have a script to reproduce this?~~

paulgessinger avatar Jan 20 '23 08:01 paulgessinger

Okay, I have investigated this a bit and some new infos:

First of all, I enabled some logging facilities in Geant4, which gave me the result that this is caused by photons quite far away from the center in z direction (z is around 1e4): image

This is reproducible in pythia also with different seeds. Then I also could reproduce the crash with the ParticleGun:

addParticleGun(
    s,
    MomentumConfig(0.1 * u.GeV, 2.0 * u.GeV, transverse=True),
    EtaConfig(-4.0, 4.0, uniform=True),
    ParticleConfig(2, acts.PdgParticle.eGamma),
    vtxGen=acts.examples.GaussianVertexGenerator(
        stddev=acts.Vector4(10 * u.mm, 10 * u.mm, 10 * u.mm, 0.0 * u.ns),
        mean=acts.Vector4(18, 3.78, 1.09e4, 0),
    ),
    multiplicity=100,
    rnd=rnd,
)

I'm not totally sure what to do with these information, but maybe someone has an idea :)

benjaminhuth avatar Jan 20 '23 11:01 benjaminhuth

So it's G4 breaking in a specific region of the detector?

paulgessinger avatar Jan 20 '23 11:01 paulgessinger

Wait the energy goes to 0 in the second step. Could it be that G4 doesn't handle photon stopping in some volumes ?

Corentin-Allaire avatar Jan 20 '23 12:01 Corentin-Allaire

Wait the energy goes to 0 in the second step. Could it be that G4 doesn't handle photon stopping in some volumes ?

No I think with the electron in the pixel endcap is everythin fine, the photon below is the problem. There it only loggs the 0th step and then segfaults.

I could imagine that a problem is that it starts already outside of the detector (in the world_volume_1)?

Could it be that the world volume is to small or something like that?

benjaminhuth avatar Jan 20 '23 12:01 benjaminhuth

Oh yeah I was looking at the wrong line... But you are right, the world volume size is 10m along z so this photon is outside the DD4Hep detector.

Corentin-Allaire avatar Jan 20 '23 13:01 Corentin-Allaire

Unfortunately, I don't think this is the only issue :( I tried to edit the particle selector to remove all particle with x, y or z larger than 5m (in abs) and it still crashes with ttbar. How did you get those extra log Benjamin ?

Corentin-Allaire avatar Jan 20 '23 15:01 Corentin-Allaire

Allready merged: https://github.com/acts-project/acts/pull/1790 With a new build from main branch you should be able to enable it via setting the logLevel to VERBOSE in the addGeant4 function.

benjaminhuth avatar Jan 20 '23 15:01 benjaminhuth

Oh perfect I will have a look next week in more detail then !

Corentin-Allaire avatar Jan 20 '23 15:01 Corentin-Allaire

Unfortunately, I don't think this is the only issue :( I tried to edit the particle selector to remove all particle with x, y or z larger than 5m (in abs) and it still crashes with ttbar. How did you get those extra log Benjamin ?

Actually I was able to run one event in the pythia8+geant4+ODD combination without segfault by increasing the world volumen manually from 10m to 100m in the ODD xml files...

I'm not sure if something like that would be a reasonable fix? Has this any other implications @asalzburger ?

I will try to run more events now, however, they take quite a long time (around 30 minutes per event)

benjaminhuth avatar Jan 23 '23 10:01 benjaminhuth

I will try to run more events now, however, they take quite a long time (around 30 minutes per event)

Okay, actually it does not resolve the issue, I still get the segfault in a later event. Maybe it has just changed the random numbers a bit so that 1 event went through.

benjaminhuth avatar Jan 23 '23 12:01 benjaminhuth

A bit unrelated but there is a bug in 'addGeant4' in 'simulation.py'. Line 597 it uses particles_input for the G4 input (instead of particles_selected) ignoring the particle selector. I can open a quick MR to fix this

Corentin-Allaire avatar Jan 23 '23 12:01 Corentin-Allaire

A bit unrelated but there is a bug in 'addGeant4' in 'simulation.py'. Line 597 it uses particles_input for the G4 input (instead of particles_selected) ignoring the particle selector. I can open a quick MR to fix this

If someone wants to have a look : https://github.com/acts-project/acts/pull/1792

Corentin-Allaire avatar Jan 23 '23 13:01 Corentin-Allaire

With this you can cut the particle outside the detector by adding preselectParticles = ParticleSelectorConfig(eta=(-3.0, 3.0),absZ=(0, 1e4), pt=(150 * u.MeV, None), removeNeutral=True), to the addGeant4. Doesn't solve the segfault in the ttbar case (but solve the photon issue).

Corentin-Allaire avatar Jan 23 '23 13:01 Corentin-Allaire

Actually the code seem to be running on my side and doesn't segfault anymore... Can someone else confirm ?

Corentin-Allaire avatar Jan 23 '23 14:01 Corentin-Allaire

@Corentin-Allaire are you using the chain from above? otherwise if you could share the script I can try to verify

andiwand avatar Jan 24 '23 08:01 andiwand