nnpdf
nnpdf copied to clipboard
Polish legacy jet datasets
The scope of this PR is to (try to) polish the new implementation of the legacy jet datasets, namely, CMS_2JET_7TEV, CMS_1JET_8TEV, ATLAS_1JET_8TEV_R06, ATLAS_2JET_7TEV_R06.
What the PR does:
- Adds a
filter_utils.utils.pymodule. This module should collect the functions that are generalisable and could, in principle, be used for all datasets. - Adds a
filter_utils.legacy_jets_utils.pymodule that collects all of the utils functions for the filter files. Polishing these is going to be much harder given the nature of the raw data.
This was not done in this PR, but I changed it since I don't think we should start getting in the habit of suppressing warnings if there is another option
This was not done in this PR, but I changed it since I don't think we should start getting in the habit of suppressing warnings if there is another option
Thanks @RoyStegeman, just to be sure I understand: removing the read "r" from with open avoids the pandas warnings?
No, removing "r" is just because that's the default option anyway.
The warnings are fixed around line 900 in legacy_jets_utils.py.
A similar error, with similar solution, that I didn't fix still exists for ATLAS_2JET_7TEV_R06. I didn't fix it now because that filter.py is broken anyway.
Since filter files will now share utilities, and we may need to change/fix those shared utilities at some point - do you think it's feasible to run all filters in the CI to test if the output remains unchanged?
https://github.com/NNPDF/nnpdf/blob/eed00c75b4910606e967d1d0be644a45fd44b913/validphys2/src/validphys/process_options.py#L27
@scarlehoff what should be done in this PR?
@scarlehoff what should be done in this PR?
Change it to pT2 I'd say.
@comane I see you addressed the CI fails (as pointed out by @Radonirinaunimi), but could you rename p_T2 to pT2?
@comane I see you addressed the CI fails (as pointed out by @Radonirinaunimi), but could you rename p_T2 to pT2?
Sure, I am not sure why though? Who wrote that comment there and why is p_T2 wrong? The tests seem to be checking for p_T2 at the moment, not for pT2
So maybe I can simply remove that comment?
I assume @scarlehoff wrote it. Either way I agree with it for consistency with pT and pT_t
Yes. I think it should be ok.