nnpdf icon indicating copy to clipboard operation
nnpdf copied to clipboard

Polish legacy jet datasets

Open comane opened this issue 1 year ago • 9 comments

The scope of this PR is to (try to) polish the new implementation of the legacy jet datasets, namely, CMS_2JET_7TEV, CMS_1JET_8TEV, ATLAS_1JET_8TEV_R06, ATLAS_2JET_7TEV_R06.

What the PR does:

  1. Adds a filter_utils.utils.py module. This module should collect the functions that are generalisable and could, in principle, be used for all datasets.
  2. Adds a filter_utils.legacy_jets_utils.py module that collects all of the utils functions for the filter files. Polishing these is going to be much harder given the nature of the raw data.

comane avatar Apr 17 '24 15:04 comane

This was not done in this PR, but I changed it since I don't think we should start getting in the habit of suppressing warnings if there is another option

RoyStegeman avatar Apr 24 '24 17:04 RoyStegeman

This was not done in this PR, but I changed it since I don't think we should start getting in the habit of suppressing warnings if there is another option

Thanks @RoyStegeman, just to be sure I understand: removing the read "r" from with open avoids the pandas warnings?

comane avatar Apr 24 '24 17:04 comane

No, removing "r" is just because that's the default option anyway.

The warnings are fixed around line 900 in legacy_jets_utils.py.

A similar error, with similar solution, that I didn't fix still exists for ATLAS_2JET_7TEV_R06. I didn't fix it now because that filter.py is broken anyway.

RoyStegeman avatar Apr 24 '24 17:04 RoyStegeman

Since filter files will now share utilities, and we may need to change/fix those shared utilities at some point - do you think it's feasible to run all filters in the CI to test if the output remains unchanged?

RoyStegeman avatar Apr 24 '24 17:04 RoyStegeman

https://github.com/NNPDF/nnpdf/blob/eed00c75b4910606e967d1d0be644a45fd44b913/validphys2/src/validphys/process_options.py#L27

@scarlehoff what should be done in this PR?

RoyStegeman avatar Apr 24 '24 18:04 RoyStegeman

@scarlehoff what should be done in this PR?

Change it to pT2 I'd say.

scarlehoff avatar Apr 24 '24 20:04 scarlehoff

@comane I see you addressed the CI fails (as pointed out by @Radonirinaunimi), but could you rename p_T2 to pT2?

RoyStegeman avatar May 06 '24 13:05 RoyStegeman

@comane I see you addressed the CI fails (as pointed out by @Radonirinaunimi), but could you rename p_T2 to pT2?

Sure, I am not sure why though? Who wrote that comment there and why is p_T2 wrong? The tests seem to be checking for p_T2 at the moment, not for pT2

So maybe I can simply remove that comment?

comane avatar May 07 '24 16:05 comane

I assume @scarlehoff wrote it. Either way I agree with it for consistency with pT and pT_t

RoyStegeman avatar May 07 '24 16:05 RoyStegeman

Yes. I think it should be ok.

scarlehoff avatar Jun 07 '24 07:06 scarlehoff