Better ECal Sim Hit Merging
Our current "solution" for merging ECal sim hits is complicated and unhelpful. We try to store separate contributors for the sim hit, but these separate contribs also need to be merged in order to save space. This merging machinery is completely based on PDG ID right now, which is unhelpful because it doesn't take time into account whatsoever.
We can re-work how EcalHitIO works after v3.0.0 so that we have access to the sim cal hit event object (which will probably be lightened by this).
Overall Plan:
- Add a Python configuration class for passing parameters to EcalHitIO
- Lighten SimCalHits by removing contribs machinery (all SimCalHits are post any merging)
- Implement configurable decisions on how to merge sim cal hits in the ecal
- Merge based on time (e.g. which "bunch" are we in)
- Merge based on track ID (multiple hits in the same cell from the same particle are merged into one hit)
- Merge based off of particle flavor (multiple electron hits in the same cell are merged into one hit)
- Set position information to zero for all ECal hits because the ID already stores fine-grained enough geometry information and ROOT will compress these branches.
- Will need to edit Ecal digitizer to match these changes (and work with all options)
- Update OverlayProducer in EventProc to clean it up
Calling in comments from @jmmans and @omar-moreno to add anything that I forgot.
Dear Tom,
Configuration is important to specify clearly.
First thoughts are based on the idea that the logical default is to merge everything in same cell, apply rules to split into multiple simhits.
(1) Enable to separate between trackids (rarely to be used) (2) Enable to separate between pdgids (unclear, and do we want to be more-generic like "electron+positron versus hadronic particles"? (3) Enable to split hits with time difference greater-than (or perhaps a time*energy product on some level)
Jeremy
On 10/23/20 11:03 AM, Tom Eichlersmith wrote:
Our current "solution" for merging ECal sim hits is complicated /and/ unhelpful. We try to store separate contributors for the sim hit, but these separate contribs also need to be merged in order to save space. This merging machinery is completely based on PDG ID right now, which is unhelpful because it doesn't take time into account whatsoever.
We can re-work how EcalHitIO works after v3.0.0 so that we have access to the sim cal hit event object (which will probably be lightened by this).
Overall Plan:
- Add a Python configuration class for passing parameters to EcalHitIO
- Lighten SimCalHits by removing contribs machinery (all SimCalHits are post any merging)
- Implement configurable decisions on how to merge sim cal hits in the ecal o Merge based on time (e.g. which "bunch" are we in) o Merge based on track ID (multiple hits in the same cell from the same particle are merged into one hit) o Merge based off of particle flavor (multiple electron hits in the same cell are merged into one hit)
Calling in comments from @jmmans https://github.com/jmmans and @omar-moreno https://github.com/omar-moreno to add anything that I forgot.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LDMX-Software/ldmx-sw/issues/1317, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFUPYCL3CBC3BQPVYUQ34DSMGSNVANCNFSM4S4XZJCA.
Is this only relevant to 2e (and higher), or already for 1e? Where are we in that overall plan envisioned above?
This is already relevant for 1e, there are enough particles in the shower that keeping all of their individual sim hits is not feasible for normal production and it is very common to want to know where certain hits "came from".
The development has not been started to my knowledge. It is one of those projects that is difficult and the current solution is working "well enough" for folks so it has fallen by the wayside.
I think it is worth keeping in mind here that resimulation allows us to drop a lot of sim truth that isn't strictly required for large samples
and it is very common to want to know where certain hits "came from".
sorry I didnt get this: isnt the current approach more inclusive and with that helps us to know the the hit came from?
The current approach can be helpful but I don't think it is necessarily helpful. It lies in this middle ground between a simplified view and a fully-detailed view of what happened where folks can get confused. I would rather have a merging strategy where the simplified view is more obvious about what the hits mean and controllable by the person configuring the simulation and leaving the detailed view to resimulation where we opt-in to save all of the information.