cmssw
cmssw copied to clipboard
Study additional decrease in SIM AOD size
Studying a Run 3 pileup based workflow (11834.21) with 1000 events shows that the following branches take the bulk of the space on disk
Branch | Relative % |
---|---|
recoTracks_generalTracks__RECO | 32.6% |
recoPFCandidates_particleFlow__RECO | 16.1% |
recoGenParticles_genParticles__HLT | 4.6% |
Applications of different branch structure, object thinning, and lossy compression strategies (described below) to those branches could allow us to decrease the SIM AOD size by > 15% depending on how much loss of information is acceptable.
A new Issue was created by @Dr15Jones Chris Jones.
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
Change to branch structure
The default for the AOD event content is to preserve the branch splitting from the input file into the output file. This was probably initially done to allow fast cloning of those branches from the input to the output. But with our use of multi-threading, we already can not use fast cloning (as using multiple threads can cause the event order in the output file to differ from the event order in the input file).
By allowing the branches read/written from the input source to be fully split, the size of the 1000 event SIM AOD file decreased by 4.7 %.
assign core,reconstruction,generators
New categories assigned: core,generators,reconstruction
@mkirsano,@menglu21,@mandrenguyen,@Dr15Jones,@smuzaffar,@clacaputo,@alberto-sanchez,@SiewYan,@makortel,@GurpreetSinghChahal,@Saptaparna you have been requested to review this Pull request/Issue and eventually sign? Thanks
Lossy Compression Overview
ROOT offers several lossy compression options, all based on the Double32_t
and Float16_t
ROOT defined typedefs.
double to float
Use of the typedef Double32_t
signifies to ROOT that although the in memory representation of a floating point value is a double
, when it comes time to write the data out the value should be converted to a float
and then store the float value. Essentially this is truncating the mantissa and exponential part of the floating point value.
The use of Double32_t
is already used extensively in the branches mentioned in the opening description.
fixed precision conversion
If a specially structured comment is placed on the same line as the declaration of a member variable of type Double32_t
(or Float16_t
) then ROOT will convert the floating point number into a fixed point number and store the result. The comment includes: the minimum value, the maximum value and the number of bits to store.
This form is quite useful for storing values which have a fixed precision, such as position measurements or possibly angular measurements.
relative precision conversion
Similar to fixed precision conversion, adding a special comment after the declaration of the member variable one can specify relative precision conversion. In this case, the full exponential range of a float
is used (which is 8 bits), but the number of bits of the mantissa is rounded down to the number of bits requested (with the maximum number of bits storable being 24). One bit for the sign of the value is also added to the storage. A Float16_t
always uses this form of lossy compression with the default number of stored bits being 12.
This is useful for storing values where the precision of the value is proportional to the value itself, such as the case for the momentum measurement.
I create a test where I looked at values from the GenParticles momentum stored with relative precision using different bit settings. The largest relative differences measured were
bits of precision | relative deviation |
---|---|
9 | 9.8 E-4 |
10 | 4.9 E -4 |
Extrapolating to 12 bits I would expect a relative deviation of 1.2 E-4.
reco::Track lossy compression
The largest contributions to the stored std::vector<reco::Track>
are the following sub-TBranches
Branch | Relative |
---|---|
covariance_[15] | 60.6% |
vertex_.fCoordinates.fX | 3.9% |
vertex_.fCoordinates.fY | 3.6% |
vertex_.fCoordinates.fZ | 3.9% |
momentum_.fCoordinates.fX | 3.9% |
momentum_.fCoordinates.fY | 3.9% |
momentum_.fCoordinates.fZ | 3.9% |
I tried various relative compression settings for covariance_
and momentum_
.
For vertex_
I noted that X and Y were bound between -150 and 150 while Z did not have an obvious bound. Assuming we'd want to keep an absolute resolution of around 1um, I used fixed precision compression on just X and Y where I kept 21 bits. Under suggestion by @slav77, I also tried bounding between -1100 and 1100 with 24 bits.
The code for this change can be found here: https://github.com/Dr15Jones/cmssw/pull/7
The results for different compression settings can be found below
compression description | reduction of AOD size | reduction of generalTracks branch only |
---|---|---|
covariance & Pxyz 9 bits, vertex 21 bits | 13.6% | 38.4% |
covariance & Pxyz 9 bits, vertex 24 bits | 13.6% | 38.5% |
covariance & Pxyz 9 bits | 12.1% | 34.0% |
covariance & Pxyz 10 bits, vertex 24 bits | 12.8% | 36.0% |
covariance & Pxyz 10 bits | 11.2% | 31.6% |
covariance & Pxyz 12 bits, vertex 24 bits | 10.9% | 30.7% |
covariance & Pxyz 12 bits | 9% | 26.2% |
reco::LeafCandidate lossy compression
The reco::PFCandidate
, reco::GenParticle
and reco::PFJet
all obtain their base information by inheriting from reco::LeafCandidate
. Therefore applying lossy compression to that class will decrease storage for all of them.
Here is the relative sizes for the shared components for PFCandidate
and GenParticle
as they are the largest on file
Branch | relative fraction of PFCandidate | relative fraction of GenParticle |
---|---|---|
vertex_.fCoordinates.fX | 6.6% | 2.1% |
vertex_.fCoordinates.fY | 6.2% | 1.9% |
vertex_.fCoordinates.fZ | 6.6% | 1.8% |
p4Polar_.fCoordinates.fPt | 10.4% | 19.2% |
p4Polar_.fCoordinates.fEta | 9.7% | 19.1% |
p4Polar_.fCoordinates.fPhi | 9.7% | 18.8% |
p4Polar_.fCoordinates.fM | 1.5% | 9.6% |
The way ROOT encodes data into fEta makes it hard to apply lossy compression as very large values of eta are truncated and then the value of Pz is added to the the internal storage of fEta. This means for low values of Pz one needs very high absolute precision. Because of that no attempt was made to apply lossy compression to fEta. Different relative lossy precision was applied to fPt while fixed precision lossy was used on fPhi. The code testing this can be found here: https://github.com/Dr15Jones/cmssw/pull/9/files
Given the difficulties with the polar representation, I created storage only variables storing Px, Py and Pz and then applied lossy compression to them. The code testing this can be found here: https://github.com/Dr15Jones/cmssw/pull/8
The relative savings on AOD size is as follows
compression description | reduction of AOD size |
---|---|
Pt 9 bits & Phi 16 bits | 1.2% |
Pt 9 bits & Phi 20 bits | 0.8% |
Pt 9 bits & Phi 24 bits | 0.4% |
Pxyz 9 bits | 2.5% |
Pxyz 10 bits | 2.2% |
Pxyz 12 bits | 1.5% |
Applying fixed precision lossy compression to Eta on just the PFCandidate objects yields the savings
compression description | reduction of AOD size |
---|---|
Pt 9 bits & Phi&Eta 16 bits | 0.7% |
Pt 9 bits & Phi&Eta 20 bits | 0.2% |
Pt 9 bits & Phi&Eta 24 bits | -0.2% (it got bigger) |
In theory, adding the savings on just PFCandidate to the savings for all Candidates would given an upper limit on applying the all the changes together. Even then, the savings is not as great as just using the Pxyz representation.
@Dr15Jones please add compression details wrt reco::Track
generalTracks
branch; the other kind of tracks (electrons or muons) are less frequent and would not affect the total compression as much.
A related question is: can we have this compression per product instead of per class?
I don't think this compression strategy for reco::Track
would be appropriate in the default global coordinates (esp p{x,y,z}).
A guide could be a quality of e.g. conversion (track pairs) reconstruction as well as frequency of getting non-positive-definite covariance after this compression.
A more appropriate for the momentum would be to go to q/p (or 1/pt), theta, phi, or even the pt,eta,phi used in the Particle representation; the compression has to preserve the angle better than the absolute value.
please add compression details wrt reco::Track generalTracks branch; This is just using standard AOD settings: fully split with LZMA 4.
A related question is: can we have this compression per product instead of per class?
I'm afraid not.
I don't think this compression strategy for reco::Track would be appropriate in the default global coordinates (esp p{x,y,z}). A guide could be a quality of e.g. conversion (track pairs) reconstruction as well as frequency of getting non-positive-definite covariance after this compression.
I think we really need further work to be pursued by experts, not a meddling amateur like myself :).
A more appropriate for the momentum would be to go to q/p (or 1/pt), theta, phi, or even the pt,eta,phi used in the Particle representation; the compression has to preserve the angle better than the absolute value.
I actually found the opposite when I did the work on PFCandidate and GenParticle. The polar notation is substantially worse when compressed and seemed to me to have worse precision behavior. But again, those judgements would be better made by professionals.
A more appropriate for the momentum would be to go to q/p (or 1/pt), theta, phi, or even the pt,eta,phi used in the Particle representation; the compression has to preserve the angle better than the absolute value.
I actually found the opposite when I did the work on PFCandidate and GenParticle. The polar notation is substantially worse when compressed and seemed to me to have worse precision behavior. But again, those judgements would be better made by professionals.
Did you really see that with the PFCandidate? I could understand the problem for GenParticle saving pz=0 tracks using 5-7-digit overflow for eta, but the PFCands should run out by eta of 5.
Please remind me if this functionality is configurable by product (instead of by class)
I tried various relative compression settings for covariance_ and momentum_.
compression of the covariance matrix elements leading to non-positively defined matrices is already now a limiting factor for analysis using miniAOD. If the plan is to do the same on AOD it needs to be done with care and carefully cross-validated.
Is there a summary of the effect in miniaod somewhere? (Or eg, what is the rate seen there?)
On Sep 20, 2022, at 2:15 PM, Marco Musich @.***> wrote:
I tried various relative compression settings for covariance_ and momentum_.
compression of the covariance matrix elements leading to non-positively defined matrices is already now a limiting factor for analysis using miniAOD. If the plan is to do the same on AOD it needs to be done with care and carefully cross-validated.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.
Is there a summary of the effect in miniaod somewhere?
https://indico.cern.ch/event/1155820/#7-track-covariance-matrices-in
(Or eg, what is the rate seen there?)
60% of the tracks used for BPH analysis have this problem.
60% of the tracks used for BPH analysis have this problem.
simple rounding will likely have a smaller impact than what's done in miniAOD with the cov parameterization
simple rounding will likely have a smaller impact than what's done in miniAOD with the cov parameterization
not objecting to that, but one would have to see the impact and have it agreed by the relevant parties.
@mmusich
not objecting to that, but one would have to see the impact and have it agreed by the relevant parties.
A major reason I made this issue is to start to get experts to begin looking at these options.
@slava77
Please remind me if this functionality is configurable by product (instead of by class)
It is only by class. (Looks like my previous reply was accidentally included in the quote I copied).
But it is possible for us to have different classes that work like the LeafCandidate (I actually did that as part of my testing) so that different inheriting classes could have different storage options.
Did you really see that with the PFCandidate? I could understand the problem for GenParticle saving pz=0 tracks using 5-7-digit overflow for eta, but the PFCands should run out by eta of 5.
It is a good point as I only studied GenParticle and have not looked at the details of PFCandidate. Matti pointed me to the PackedCandidate which is used in MiniAOD storage and there the eta is truncated at +-6.
As a test of the lose of precision I ran RECO jobs on the same 1000 events using just 1 thread in order to get event ordering identical. I then wrote out the momentum related values from the GenParticles and wrote them to a test structure which stored the values using different compression values/methods. I then measured the deviation of the values stored with compression to the values stored fully. The results are below
Max Deviation | Original | Pt 9bit, Phi 16 bit | Pt 9 bit, Phi 20 bit | Pt 9 bit, Phi 24 bit | Pxyz 9 bit | Pxyz 12 bit |
---|---|---|---|---|---|---|
P ratio | 0. | 9.8E-4 | 9.8E-4 | 9.8E-4 | 9.8E-4 | 1.2E-4 |
Pt ratio | 0. | 9.8E-4 | 9.8E-4 | 9.8E-4 | 9.8E-4 | 1.2E-4 |
Px ratio | 0. | 6.3E-3 | 1.1E-3 | 9.8E-4 | 9.8E-4 | 1.2E-4 |
Px diff | 0. | 1.1E-4 | 7.3E-5 | 7.0E-5 | - | - |
Py ratio | 0. | 6.9E-3 | 1.1E-3 | 9.8E-4 | 9.8E-4 | 1.2E-4 |
Py diff | 0. | 9.7E-6 | 1.0E-5 | 1.0E-5 | - | - |
Pz ratio | 0. | 9.8E-4 | 9.8E-4 | 9.8E-4 | 9.8E-4 | 1.2E-4 |
Phi diff | 0. | 4.8E-5 | 3.0E-6 | 1.9E-7 | 9.3E-4 | 1.1E-4 |
Eta ratio | 0. | 0. | 0. | 0. | 1.8E-3 | 2.3E-4 |
I have now gone and applied a Eta fixed precision between -6 and 6 on all particleFlow
PFCandidates in the 1000 event file and compared the precision. (This could not safety be done with GeParticles since we do have important particles with eta > 6).
Max Deviation | Pt 9bit, Phi&Eta 16 bit | Pt 9 bit, Phi&Eta 20 bit | Pt 9 bit, Phi&Eta 24 bit | Pt 12bit, Phi&Eta 20 bit |
---|---|---|---|---|
P ratio | 1.1E-3 | 9.8E-4 | 9.8E-4 | 1.3E-4 |
Pt ratio | 9.8E-4 | 9.8E-4 | 9.8E-4 | 1.2E-4 |
Px ratio | 1.7E-3 | 9.8E-4 | 9.8E-4 | 1.6E-4 |
Px diff | 1.5E-4 | 2.0E-4 | 2.0E-4 | 1.0E-5 |
Py ratio | 1.7E-3 | 9.8E-4 | 9.8E-4 | 1.3E-4 |
Py diff | 2.3E-4 | 2.2E-4 | 2.2E-4 | 2.7E-6 |
Pz ratio | 4.4E-3 | 9.8E-4 | 9.8E-4 | 2.5E-4 |
Pz diff | 6.1E-6 | 1.0E-4 | 1.0E-4 | 1.2E-5 |
Phi diff | 4.8E-5 | 3.0E-6 | 1.9E-7 | 3.0E-6 |
Eta diff | 9.2E-5 | 5.7E-6 | 3.6E-7 | 5.7E-6 |
A major reason I made this issue is to start to get experts to begin looking at these options.
what are the target gains in % expected from this effort?
About the number of bit for px,py,pz, if I take a naive linear propagation of the relative uncertainty, for the W mass we'll need 1e-6 precision (0.1 MeV); so, it's 19 or 20 bits.
Perhaps making some toy phase-space would show the more appropriate connection. @bendavid is my consideration appropriate or are there some cancellations (or, worse, enhancements) in the relationship between the track momentum precision and the fitted mass?
what are the target gains in % expected from this effort?
Such a question is "above my pay grade" :). I'm just showing that we have the possibility for substantial gains by exploring this area. @dpiparo ?
@slava77
About the number of bit for px,py,pz, if I take a naive linear propagation of the relative uncertainty, for the W mass we'll need 1e-6 precision (0.1 MeV); so, it's 19 or 20 bits.
From what I can tell, this is what the MiniAOD is using https://github.com/cms-sw/cmssw/blob/8a3b1522dcc2cf7f5beea3d946f01f5e2dd44cdf/DataFormats/PatCandidates/src/PackedCandidate.cc#L13-L23
where MiniFloatConverter::float32to16
is
https://github.com/cms-sw/cmssw/blob/c9da596d0807a487c46d5940ffbb727da0de8064/DataFormats/Math/interface/libminifloat.h#L17
which is https://github.com/cms-sw/cmssw/blob/c9da596d0807a487c46d5940ffbb727da0de8064/DataFormats/Math/interface/libminifloat.h#L24-L36
which (I think) stores 11 bits of the mantissa and 4 bits of the exponent and 1 bit for the sign.
From what I can tell, this is what the MiniAOD is using ... which (I think) stores 11 bits of the mantissa and 4 bits of the exponent and 1 bit for the sign.
On one hand, not everything in miniAOD is saved at this precision; electrons and muons have the same precision as AOD. On the other hand my example leading to 1E-6 precision need is too simplistic, that's more relevant for a fit over a distribution (so, some sqrt(N) or so contributes); the single candidate resolution requirements are likely much softer. I'll try to come up with a more clear toy/case.
Did you really see that with the PFCandidate? I could understand the problem for GenParticle saving pz=0 tracks using 5-7-digit overflow for eta, but the PFCands should run out by eta of 5.
@slava77 I have now applied fixed precision lossy compression on the Eta of the PFCandidates and added another table about the precision tradeoffs. I'm in the process of trying to evaluate the space savings related to the change.
@slava77 how would you like me to proceed on this question you asked?
please add compression details wrt reco::Track generalTracks branch; the other kind of tracks (electrons or muons) are less frequent and would not affect the total compression as much.
What exact measurements would you like?
@slava77 how would you like me to proceed on this question you asked?
please add compression details wrt reco::Track generalTracks branch; the other kind of tracks (electrons or muons) are less frequent and would not affect the total compression as much.
What exact measurements would you like?
I wanted to know the relative reduction of the product size (can be generalTracks
, which is the largest).
@slava77
I wanted to know the relative reduction of the product size (can be generalTracks, which is the largest).
I've updated the reco::Track lossy compression comment to include a column with just the reduction in product size of the generalTracks
for the different compression options tried.
@slava77 how does reconstruction want to proceed?
@slava77 how does reconstruction want to proceed?
I can only respond for tracking (reco::Tracks
part of this issue).
We could start with no-PU validation using samples of the same type as in a recent tracking validation https://its.cern.ch/jira/browse/PDMVRELVALS-158
perhaps covariance & Pxyz 10 bits
and covariance & Pxyz 12 bits
.
Please clarify what you did for the vertex_
: was it a change in nbits or also a bound (150 cm was mentioned)? I'd take 11m as a somewhat safer alternative, in case some muon studies decide to put the ref point on the last possible measurement.