pyhf
pyhf copied to clipboard
fix: Use qtilde as test stat for validation tests
Description
Resolves #1067
Follows after PR #1114
Set qtilde=True in validate_hypotest
https://github.com/scikit-hep/pyhf/blob/8a6ee36da4f566d8a37df01e20201098aa1f8a54/tests/test_validation.py#L572-L580
as changes in the validation process now make qtilde the correct test stat. See Issue #1067 for some historical notes and motivation.
Checklist Before Requesting Reviewer
- [x] Tests are passing
- [x] "WIP" removed from the title of the pull request
- [x] Selected an Assignee for the PR to be responsible for the log summary
Before Merging
For the PR Assignees:
- [x] Summarize commit messages into a comprehensive review of the PR
* Use qtilde as the test statistic for validate_hypotest in validation tests
- Change from PR #442
- Eliminates warnings on test stat choice
* Correct expected result for 1bin_lumi test
Codecov Report
Merging #1068 (ca82036) into master (8c90450) will not change coverage. The diff coverage is
n/a.
@@ Coverage Diff @@
## master #1068 +/- ##
=======================================
Coverage 98.10% 98.10%
=======================================
Files 64 64
Lines 4226 4226
Branches 587 587
=======================================
Hits 4146 4146
Misses 46 46
Partials 34 34
| Flag | Coverage Δ | |
|---|---|---|
| contrib | 25.41% <ø> (ø) |
|
| doctest | 61.19% <ø> (ø) |
|
| unittests | 96.42% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 8c90450...ca82036. Read the comment docs.
Should we maybe have this test in addition to the old test instead of replacing it?
Should we maybe have this test in addition to the old test instead of replacing it?
We probably need the old test, but change the bounds to allow the POI to go negative so as to not raise warnings during the test itself.
Should we maybe have this test in addition to the old test instead of replacing it?
We probably need the old test, but change the bounds to allow the POI to go negative so as to not raise warnings during the test itself.
Is this actually necessary? There wasn't anything in particular related to the validation that required the POI to be negative.
Is this actually necessary? There wasn't anything in particular related to the validation that required the POI to be negative.
I think the concern that we don't cover qmu in validation is a fair point.
@kratsg @lukasheinrich it has been a long time so can you help remind me of the plan? Was the idea to make a new expected_result_1bin_lumi fixture and model to support q having lower bounds that are negative?
@kratsg @lukasheinrich it has been a long time so can you help remind me of the plan? Was the idea to make a new
expected_result_1bin_lumifixture and model to supportqhaving lower bounds that are negative?
Basically just have a fixture similar to one of them, but using the qtilde test stat instead of the q one -- with the values, etc... we need to refactor all of this later, but for now, just copy/paste and make the test_validation even longer.
Basically just have a fixture similar to one of them, but using the qtilde test stat instead of the q one -- with the values, etc... we need to refactor all of this later, but for now, just copy/paste and make the test_validation even longer.
This will require more than that given the setup is all name based
https://github.com/scikit-hep/pyhf/blob/5c05e6ea90157576bc3c3b2277d6bc9861ed6e7a/tests/test_validation.py#L755-L759
but we can't reuse the spec from qtilde for qmu as the bounds will necessarily need to be changed. As the name has to be the same for the source and the spec this is going to require a rename of all source and spec fixtures to be able to differentiate between qtilde and q.
@kratsg Or is there a better way to adjust the bounds like
https://github.com/scikit-hep/pyhf/blob/5c05e6ea90157576bc3c3b2277d6bc9861ed6e7a/tests/test_validation.py#L170
without renaming everything and creating a new spec?
without renaming everything and creating a new spec?
no, hence why the need for refactor later. #1241.
Oh, I misunderstood. The names don't need to change. Only need to copy out the spec for one of the existing ones, add "qmu" to the name instead -- and change the bounds to [-10, 10] so that the rest don't need to get renamed, and are switched to qtilde test stat.
Only need to copy out the spec for one of the existing ones, add "qmu" to the name instead -- and change the bounds to
[-10, 10]so that the rest don't need to get renamed, and are switched to qtilde test stat.
Hm, so maybe I'm misunderstanding the original request, but you only want to test a single spec with q and qtilde and not all? If so, then probably just do 1bin_shapesys_mu1 so that we'd go from
https://github.com/scikit-hep/pyhf/blob/5c05e6ea90157576bc3c3b2277d6bc9861ed6e7a/tests/test_validation.py#L743-L745
to
'1bin_shapesys_qtilde_mu1',
'1bin_shapesys_q_mu1',
'1bin_shapesys_q0_mu1',
'1bin_shapesys_q0_mu1_toys',
?
In the last commit that I had made here I was thinking that we needed to test all the specs with both qtilde and q, but if we just want to test and validate one then that's much easier.