pyhf icon indicating copy to clipboard operation
pyhf copied to clipboard

fix: Use qtilde as test stat for validation tests

Open matthewfeickert opened this issue 5 years ago • 11 comments

Description

Resolves #1067

Follows after PR #1114

Set qtilde=True in validate_hypotest

https://github.com/scikit-hep/pyhf/blob/8a6ee36da4f566d8a37df01e20201098aa1f8a54/tests/test_validation.py#L572-L580

as changes in the validation process now make qtilde the correct test stat. See Issue #1067 for some historical notes and motivation.

Checklist Before Requesting Reviewer

  • [x] Tests are passing
  • [x] "WIP" removed from the title of the pull request
  • [x] Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

  • [x] Summarize commit messages into a comprehensive review of the PR
* Use qtilde as the test statistic for validate_hypotest in validation tests
   - Change from PR #442 
   - Eliminates warnings on test stat choice
* Correct expected result for 1bin_lumi test

matthewfeickert avatar Sep 16 '20 21:09 matthewfeickert

Codecov Report

Merging #1068 (ca82036) into master (8c90450) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1068   +/-   ##
=======================================
  Coverage   98.10%   98.10%           
=======================================
  Files          64       64           
  Lines        4226     4226           
  Branches      587      587           
=======================================
  Hits         4146     4146           
  Misses         46       46           
  Partials       34       34           
Flag Coverage Δ
contrib 25.41% <ø> (ø)
doctest 61.19% <ø> (ø)
unittests 96.42% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8c90450...ca82036. Read the comment docs.

codecov[bot] avatar Sep 16 '20 21:09 codecov[bot]

Should we maybe have this test in addition to the old test instead of replacing it?

lukasheinrich avatar Sep 17 '20 13:09 lukasheinrich

Should we maybe have this test in addition to the old test instead of replacing it?

We probably need the old test, but change the bounds to allow the POI to go negative so as to not raise warnings during the test itself.

kratsg avatar Sep 17 '20 14:09 kratsg

Should we maybe have this test in addition to the old test instead of replacing it?

We probably need the old test, but change the bounds to allow the POI to go negative so as to not raise warnings during the test itself.

Is this actually necessary? There wasn't anything in particular related to the validation that required the POI to be negative.

matthewfeickert avatar Sep 17 '20 14:09 matthewfeickert

Is this actually necessary? There wasn't anything in particular related to the validation that required the POI to be negative.

I think the concern that we don't cover qmu in validation is a fair point.

kratsg avatar Sep 17 '20 15:09 kratsg

@kratsg @lukasheinrich it has been a long time so can you help remind me of the plan? Was the idea to make a new expected_result_1bin_lumi fixture and model to support q having lower bounds that are negative?

matthewfeickert avatar Feb 08 '21 19:02 matthewfeickert

@kratsg @lukasheinrich it has been a long time so can you help remind me of the plan? Was the idea to make a new expected_result_1bin_lumi fixture and model to support q having lower bounds that are negative?

Basically just have a fixture similar to one of them, but using the qtilde test stat instead of the q one -- with the values, etc... we need to refactor all of this later, but for now, just copy/paste and make the test_validation even longer.

kratsg avatar Feb 08 '21 21:02 kratsg

Basically just have a fixture similar to one of them, but using the qtilde test stat instead of the q one -- with the values, etc... we need to refactor all of this later, but for now, just copy/paste and make the test_validation even longer.

This will require more than that given the setup is all name based

https://github.com/scikit-hep/pyhf/blob/5c05e6ea90157576bc3c3b2277d6bc9861ed6e7a/tests/test_validation.py#L755-L759

but we can't reuse the spec from qtilde for qmu as the bounds will necessarily need to be changed. As the name has to be the same for the source and the spec this is going to require a rename of all source and spec fixtures to be able to differentiate between qtilde and q.

@kratsg Or is there a better way to adjust the bounds like

https://github.com/scikit-hep/pyhf/blob/5c05e6ea90157576bc3c3b2277d6bc9861ed6e7a/tests/test_validation.py#L170

without renaming everything and creating a new spec?

matthewfeickert avatar Feb 08 '21 23:02 matthewfeickert

without renaming everything and creating a new spec?

no, hence why the need for refactor later. #1241.

kratsg avatar Feb 09 '21 01:02 kratsg

Oh, I misunderstood. The names don't need to change. Only need to copy out the spec for one of the existing ones, add "qmu" to the name instead -- and change the bounds to [-10, 10] so that the rest don't need to get renamed, and are switched to qtilde test stat.

kratsg avatar Feb 09 '21 01:02 kratsg

Only need to copy out the spec for one of the existing ones, add "qmu" to the name instead -- and change the bounds to [-10, 10] so that the rest don't need to get renamed, and are switched to qtilde test stat.

Hm, so maybe I'm misunderstanding the original request, but you only want to test a single spec with q and qtilde and not all? If so, then probably just do 1bin_shapesys_mu1 so that we'd go from

https://github.com/scikit-hep/pyhf/blob/5c05e6ea90157576bc3c3b2277d6bc9861ed6e7a/tests/test_validation.py#L743-L745

to

         '1bin_shapesys_qtilde_mu1', 
         '1bin_shapesys_q_mu1', 
         '1bin_shapesys_q0_mu1', 
         '1bin_shapesys_q0_mu1_toys', 

?

In the last commit that I had made here I was thinking that we needed to test all the specs with both qtilde and q, but if we just want to test and validate one then that's much easier.

matthewfeickert avatar Feb 09 '21 03:02 matthewfeickert