cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Study additional decrease in SIM AOD size

Open Dr15Jones opened this issue 2 years ago • 48 comments

Studying a Run 3 pileup based workflow (11834.21) with 1000 events shows that the following branches take the bulk of the space on disk

Branch Relative %
recoTracks_generalTracks__RECO 32.6%
recoPFCandidates_particleFlow__RECO 16.1%
recoGenParticles_genParticles__HLT 4.6%

Applications of different branch structure, object thinning, and lossy compression strategies (described below) to those branches could allow us to decrease the SIM AOD size by > 15% depending on how much loss of information is acceptable.

Dr15Jones avatar Sep 19 '22 16:09 Dr15Jones

A new Issue was created by @Dr15Jones Chris Jones.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Sep 19 '22 16:09 cmsbuild

Change to branch structure

The default for the AOD event content is to preserve the branch splitting from the input file into the output file. This was probably initially done to allow fast cloning of those branches from the input to the output. But with our use of multi-threading, we already can not use fast cloning (as using multiple threads can cause the event order in the output file to differ from the event order in the input file).

By allowing the branches read/written from the input source to be fully split, the size of the 1000 event SIM AOD file decreased by 4.7 %.

Dr15Jones avatar Sep 19 '22 16:09 Dr15Jones

assign core,reconstruction,generators

makortel avatar Sep 19 '22 16:09 makortel

New categories assigned: core,generators,reconstruction

@mkirsano,@menglu21,@mandrenguyen,@Dr15Jones,@smuzaffar,@clacaputo,@alberto-sanchez,@SiewYan,@makortel,@GurpreetSinghChahal,@Saptaparna you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild avatar Sep 19 '22 16:09 cmsbuild

Lossy Compression Overview

ROOT offers several lossy compression options, all based on the Double32_t and Float16_t ROOT defined typedefs.

double to float

Use of the typedef Double32_t signifies to ROOT that although the in memory representation of a floating point value is a double, when it comes time to write the data out the value should be converted to a float and then store the float value. Essentially this is truncating the mantissa and exponential part of the floating point value.

The use of Double32_t is already used extensively in the branches mentioned in the opening description.

fixed precision conversion

If a specially structured comment is placed on the same line as the declaration of a member variable of type Double32_t (or Float16_t) then ROOT will convert the floating point number into a fixed point number and store the result. The comment includes: the minimum value, the maximum value and the number of bits to store.

This form is quite useful for storing values which have a fixed precision, such as position measurements or possibly angular measurements.

relative precision conversion

Similar to fixed precision conversion, adding a special comment after the declaration of the member variable one can specify relative precision conversion. In this case, the full exponential range of a float is used (which is 8 bits), but the number of bits of the mantissa is rounded down to the number of bits requested (with the maximum number of bits storable being 24). One bit for the sign of the value is also added to the storage. A Float16_t always uses this form of lossy compression with the default number of stored bits being 12.

This is useful for storing values where the precision of the value is proportional to the value itself, such as the case for the momentum measurement.

I create a test where I looked at values from the GenParticles momentum stored with relative precision using different bit settings. The largest relative differences measured were

bits of precision relative deviation
9 9.8 E-4
10 4.9 E -4

Extrapolating to 12 bits I would expect a relative deviation of 1.2 E-4.

Dr15Jones avatar Sep 19 '22 17:09 Dr15Jones

reco::Track lossy compression

The largest contributions to the stored std::vector<reco::Track> are the following sub-TBranches

Branch Relative
covariance_[15] 60.6%
vertex_.fCoordinates.fX 3.9%
vertex_.fCoordinates.fY 3.6%
vertex_.fCoordinates.fZ 3.9%
momentum_.fCoordinates.fX 3.9%
momentum_.fCoordinates.fY 3.9%
momentum_.fCoordinates.fZ 3.9%

I tried various relative compression settings for covariance_ and momentum_.

For vertex_ I noted that X and Y were bound between -150 and 150 while Z did not have an obvious bound. Assuming we'd want to keep an absolute resolution of around 1um, I used fixed precision compression on just X and Y where I kept 21 bits. Under suggestion by @slav77, I also tried bounding between -1100 and 1100 with 24 bits.

The code for this change can be found here: https://github.com/Dr15Jones/cmssw/pull/7

The results for different compression settings can be found below

compression description reduction of AOD size reduction of generalTracks branch only
covariance & Pxyz 9 bits, vertex 21 bits 13.6% 38.4%
covariance & Pxyz 9 bits, vertex 24 bits 13.6% 38.5%
covariance & Pxyz 9 bits 12.1% 34.0%
covariance & Pxyz 10 bits, vertex 24 bits 12.8% 36.0%
covariance & Pxyz 10 bits 11.2% 31.6%
covariance & Pxyz 12 bits, vertex 24 bits 10.9% 30.7%
covariance & Pxyz 12 bits 9% 26.2%

Dr15Jones avatar Sep 19 '22 19:09 Dr15Jones

reco::LeafCandidate lossy compression

The reco::PFCandidate, reco::GenParticle and reco::PFJet all obtain their base information by inheriting from reco::LeafCandidate. Therefore applying lossy compression to that class will decrease storage for all of them.

Here is the relative sizes for the shared components for PFCandidate and GenParticle as they are the largest on file

Branch relative fraction of PFCandidate relative fraction of GenParticle
vertex_.fCoordinates.fX 6.6% 2.1%
vertex_.fCoordinates.fY 6.2% 1.9%
vertex_.fCoordinates.fZ 6.6% 1.8%
p4Polar_.fCoordinates.fPt 10.4% 19.2%
p4Polar_.fCoordinates.fEta 9.7% 19.1%
p4Polar_.fCoordinates.fPhi 9.7% 18.8%
p4Polar_.fCoordinates.fM 1.5% 9.6%

The way ROOT encodes data into fEta makes it hard to apply lossy compression as very large values of eta are truncated and then the value of Pz is added to the the internal storage of fEta. This means for low values of Pz one needs very high absolute precision. Because of that no attempt was made to apply lossy compression to fEta. Different relative lossy precision was applied to fPt while fixed precision lossy was used on fPhi. The code testing this can be found here: https://github.com/Dr15Jones/cmssw/pull/9/files

Given the difficulties with the polar representation, I created storage only variables storing Px, Py and Pz and then applied lossy compression to them. The code testing this can be found here: https://github.com/Dr15Jones/cmssw/pull/8

The relative savings on AOD size is as follows

compression description reduction of AOD size
Pt 9 bits & Phi 16 bits 1.2%
Pt 9 bits & Phi 20 bits 0.8%
Pt 9 bits & Phi 24 bits 0.4%
Pxyz 9 bits 2.5%
Pxyz 10 bits 2.2%
Pxyz 12 bits 1.5%

Applying fixed precision lossy compression to Eta on just the PFCandidate objects yields the savings

compression description reduction of AOD size
Pt 9 bits & Phi&Eta 16 bits 0.7%
Pt 9 bits & Phi&Eta 20 bits 0.2%
Pt 9 bits & Phi&Eta 24 bits -0.2% (it got bigger)

In theory, adding the savings on just PFCandidate to the savings for all Candidates would given an upper limit on applying the all the changes together. Even then, the savings is not as great as just using the Pxyz representation.

Dr15Jones avatar Sep 19 '22 20:09 Dr15Jones

@Dr15Jones please add compression details wrt reco::Track generalTracks branch; the other kind of tracks (electrons or muons) are less frequent and would not affect the total compression as much. A related question is: can we have this compression per product instead of per class?

I don't think this compression strategy for reco::Track would be appropriate in the default global coordinates (esp p{x,y,z}). A guide could be a quality of e.g. conversion (track pairs) reconstruction as well as frequency of getting non-positive-definite covariance after this compression.

A more appropriate for the momentum would be to go to q/p (or 1/pt), theta, phi, or even the pt,eta,phi used in the Particle representation; the compression has to preserve the angle better than the absolute value.

slava77 avatar Sep 19 '22 20:09 slava77

please add compression details wrt reco::Track generalTracks branch; This is just using standard AOD settings: fully split with LZMA 4.

A related question is: can we have this compression per product instead of per class?

I'm afraid not.

I don't think this compression strategy for reco::Track would be appropriate in the default global coordinates (esp p{x,y,z}). A guide could be a quality of e.g. conversion (track pairs) reconstruction as well as frequency of getting non-positive-definite covariance after this compression.

I think we really need further work to be pursued by experts, not a meddling amateur like myself :).

A more appropriate for the momentum would be to go to q/p (or 1/pt), theta, phi, or even the pt,eta,phi used in the Particle representation; the compression has to preserve the angle better than the absolute value.

I actually found the opposite when I did the work on PFCandidate and GenParticle. The polar notation is substantially worse when compressed and seemed to me to have worse precision behavior. But again, those judgements would be better made by professionals.

Dr15Jones avatar Sep 19 '22 20:09 Dr15Jones

A more appropriate for the momentum would be to go to q/p (or 1/pt), theta, phi, or even the pt,eta,phi used in the Particle representation; the compression has to preserve the angle better than the absolute value.

I actually found the opposite when I did the work on PFCandidate and GenParticle. The polar notation is substantially worse when compressed and seemed to me to have worse precision behavior. But again, those judgements would be better made by professionals.

Did you really see that with the PFCandidate? I could understand the problem for GenParticle saving pz=0 tracks using 5-7-digit overflow for eta, but the PFCands should run out by eta of 5.

Please remind me if this functionality is configurable by product (instead of by class)

slava77 avatar Sep 19 '22 20:09 slava77

I tried various relative compression settings for covariance_ and momentum_.

compression of the covariance matrix elements leading to non-positively defined matrices is already now a limiting factor for analysis using miniAOD. If the plan is to do the same on AOD it needs to be done with care and carefully cross-validated.

mmusich avatar Sep 20 '22 12:09 mmusich

Is there a summary of the effect in miniaod somewhere? (Or eg, what is the rate seen there?)

On Sep 20, 2022, at 2:15 PM, Marco Musich @.***> wrote:

I tried various relative compression settings for covariance_ and momentum_.

compression of the covariance matrix elements leading to non-positively defined matrices is already now a limiting factor for analysis using miniAOD. If the plan is to do the same on AOD it needs to be done with care and carefully cross-validated.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

davidlange6 avatar Sep 20 '22 12:09 davidlange6

Is there a summary of the effect in miniaod somewhere?

https://indico.cern.ch/event/1155820/#7-track-covariance-matrices-in

(Or eg, what is the rate seen there?)

60% of the tracks used for BPH analysis have this problem.

mmusich avatar Sep 20 '22 12:09 mmusich

60% of the tracks used for BPH analysis have this problem.

simple rounding will likely have a smaller impact than what's done in miniAOD with the cov parameterization

slava77 avatar Sep 20 '22 13:09 slava77

simple rounding will likely have a smaller impact than what's done in miniAOD with the cov parameterization

not objecting to that, but one would have to see the impact and have it agreed by the relevant parties.

mmusich avatar Sep 20 '22 13:09 mmusich

@mmusich

not objecting to that, but one would have to see the impact and have it agreed by the relevant parties.

A major reason I made this issue is to start to get experts to begin looking at these options.

Dr15Jones avatar Sep 20 '22 13:09 Dr15Jones

@slava77

Please remind me if this functionality is configurable by product (instead of by class)

It is only by class. (Looks like my previous reply was accidentally included in the quote I copied).

But it is possible for us to have different classes that work like the LeafCandidate (I actually did that as part of my testing) so that different inheriting classes could have different storage options.

Dr15Jones avatar Sep 20 '22 13:09 Dr15Jones

Did you really see that with the PFCandidate? I could understand the problem for GenParticle saving pz=0 tracks using 5-7-digit overflow for eta, but the PFCands should run out by eta of 5.

It is a good point as I only studied GenParticle and have not looked at the details of PFCandidate. Matti pointed me to the PackedCandidate which is used in MiniAOD storage and there the eta is truncated at +-6.

Dr15Jones avatar Sep 20 '22 13:09 Dr15Jones

As a test of the lose of precision I ran RECO jobs on the same 1000 events using just 1 thread in order to get event ordering identical. I then wrote out the momentum related values from the GenParticles and wrote them to a test structure which stored the values using different compression values/methods. I then measured the deviation of the values stored with compression to the values stored fully. The results are below

Max Deviation Original Pt 9bit, Phi 16 bit Pt 9 bit, Phi 20 bit Pt 9 bit, Phi 24 bit Pxyz 9 bit Pxyz 12 bit
P ratio 0. 9.8E-4 9.8E-4 9.8E-4 9.8E-4 1.2E-4
Pt ratio 0. 9.8E-4 9.8E-4 9.8E-4 9.8E-4 1.2E-4
Px ratio 0. 6.3E-3 1.1E-3 9.8E-4 9.8E-4 1.2E-4
Px diff 0. 1.1E-4 7.3E-5 7.0E-5 - -
Py ratio 0. 6.9E-3 1.1E-3 9.8E-4 9.8E-4 1.2E-4
Py diff 0. 9.7E-6 1.0E-5 1.0E-5 - -
Pz ratio 0. 9.8E-4 9.8E-4 9.8E-4 9.8E-4 1.2E-4
Phi diff 0. 4.8E-5 3.0E-6 1.9E-7 9.3E-4 1.1E-4
Eta ratio 0. 0. 0. 0. 1.8E-3 2.3E-4

I have now gone and applied a Eta fixed precision between -6 and 6 on all particleFlow PFCandidates in the 1000 event file and compared the precision. (This could not safety be done with GeParticles since we do have important particles with eta > 6).

Max Deviation Pt 9bit, Phi&Eta 16 bit Pt 9 bit, Phi&Eta 20 bit Pt 9 bit, Phi&Eta 24 bit Pt 12bit, Phi&Eta 20 bit
P ratio 1.1E-3 9.8E-4 9.8E-4 1.3E-4
Pt ratio 9.8E-4 9.8E-4 9.8E-4 1.2E-4
Px ratio 1.7E-3 9.8E-4 9.8E-4 1.6E-4
Px diff 1.5E-4 2.0E-4 2.0E-4 1.0E-5
Py ratio 1.7E-3 9.8E-4 9.8E-4 1.3E-4
Py diff 2.3E-4 2.2E-4 2.2E-4 2.7E-6
Pz ratio 4.4E-3 9.8E-4 9.8E-4 2.5E-4
Pz diff 6.1E-6 1.0E-4 1.0E-4 1.2E-5
Phi diff 4.8E-5 3.0E-6 1.9E-7 3.0E-6
Eta diff 9.2E-5 5.7E-6 3.6E-7 5.7E-6

Dr15Jones avatar Sep 20 '22 14:09 Dr15Jones

A major reason I made this issue is to start to get experts to begin looking at these options.

what are the target gains in % expected from this effort?

mmusich avatar Sep 20 '22 14:09 mmusich

About the number of bit for px,py,pz, if I take a naive linear propagation of the relative uncertainty, for the W mass we'll need 1e-6 precision (0.1 MeV); so, it's 19 or 20 bits.

Perhaps making some toy phase-space would show the more appropriate connection. @bendavid is my consideration appropriate or are there some cancellations (or, worse, enhancements) in the relationship between the track momentum precision and the fitted mass?

slava77 avatar Sep 20 '22 14:09 slava77

what are the target gains in % expected from this effort?

Such a question is "above my pay grade" :). I'm just showing that we have the possibility for substantial gains by exploring this area. @dpiparo ?

Dr15Jones avatar Sep 20 '22 14:09 Dr15Jones

@slava77

About the number of bit for px,py,pz, if I take a naive linear propagation of the relative uncertainty, for the W mass we'll need 1e-6 precision (0.1 MeV); so, it's 19 or 20 bits.

From what I can tell, this is what the MiniAOD is using https://github.com/cms-sw/cmssw/blob/8a3b1522dcc2cf7f5beea3d946f01f5e2dd44cdf/DataFormats/PatCandidates/src/PackedCandidate.cc#L13-L23

where MiniFloatConverter::float32to16 is https://github.com/cms-sw/cmssw/blob/c9da596d0807a487c46d5940ffbb727da0de8064/DataFormats/Math/interface/libminifloat.h#L17

which is https://github.com/cms-sw/cmssw/blob/c9da596d0807a487c46d5940ffbb727da0de8064/DataFormats/Math/interface/libminifloat.h#L24-L36

which (I think) stores 11 bits of the mantissa and 4 bits of the exponent and 1 bit for the sign.

Dr15Jones avatar Sep 20 '22 14:09 Dr15Jones

From what I can tell, this is what the MiniAOD is using ... which (I think) stores 11 bits of the mantissa and 4 bits of the exponent and 1 bit for the sign.

On one hand, not everything in miniAOD is saved at this precision; electrons and muons have the same precision as AOD. On the other hand my example leading to 1E-6 precision need is too simplistic, that's more relevant for a fit over a distribution (so, some sqrt(N) or so contributes); the single candidate resolution requirements are likely much softer. I'll try to come up with a more clear toy/case.

slava77 avatar Sep 20 '22 15:09 slava77

Did you really see that with the PFCandidate? I could understand the problem for GenParticle saving pz=0 tracks using 5-7-digit overflow for eta, but the PFCands should run out by eta of 5.

@slava77 I have now applied fixed precision lossy compression on the Eta of the PFCandidates and added another table about the precision tradeoffs. I'm in the process of trying to evaluate the space savings related to the change.

Dr15Jones avatar Sep 22 '22 19:09 Dr15Jones

@slava77 how would you like me to proceed on this question you asked?

please add compression details wrt reco::Track generalTracks branch; the other kind of tracks (electrons or muons) are less frequent and would not affect the total compression as much.

What exact measurements would you like?

Dr15Jones avatar Sep 23 '22 13:09 Dr15Jones

@slava77 how would you like me to proceed on this question you asked?

please add compression details wrt reco::Track generalTracks branch; the other kind of tracks (electrons or muons) are less frequent and would not affect the total compression as much.

What exact measurements would you like?

I wanted to know the relative reduction of the product size (can be generalTracks, which is the largest).

slava77 avatar Sep 23 '22 14:09 slava77

@slava77

I wanted to know the relative reduction of the product size (can be generalTracks, which is the largest).

I've updated the reco::Track lossy compression comment to include a column with just the reduction in product size of the generalTracks for the different compression options tried.

Dr15Jones avatar Sep 23 '22 15:09 Dr15Jones

@slava77 how does reconstruction want to proceed?

Dr15Jones avatar Sep 26 '22 12:09 Dr15Jones

@slava77 how does reconstruction want to proceed?

I can only respond for tracking (reco::Tracks part of this issue). We could start with no-PU validation using samples of the same type as in a recent tracking validation https://its.cern.ch/jira/browse/PDMVRELVALS-158 perhaps covariance & Pxyz 10 bits and covariance & Pxyz 12 bits.

Please clarify what you did for the vertex_: was it a change in nbits or also a bound (150 cm was mentioned)? I'd take 11m as a somewhat safer alternative, in case some muon studies decide to put the ref point on the last possible measurement.

slava77 avatar Sep 26 '22 15:09 slava77