sherpa
sherpa copied to clipboard
chi2xspecvar does not match XSPEC for bg-subtracted data if src or bg has 0 counts in a channel
I took the 3c273.pi
and 3c273_bg.pi
files from our test suite and edited them so they both had BACKSCAL=1.0, EXPOSURE=1.0, and had no grouping channels. I then compared the errors calculated by XSPEC 12.9.1 and Sherpa in CIAO 4.9 (essentially the same as the current master branch @aa98d19553d1e62d3c65ba5abf33562e5e7dbe4c) with (in Sherpa)
set_analysis('channels')
subtract()
set_stat('chi2xspecvar')
The only channels where the error values were different had the following pattern (there were multiple channels with some of these combination of source and background coutns):
nsrc | nbgnd | Var(sherpa) | Var(xspec) |
---|---|---|---|
5 | 0 | 6.0 | 5.0 |
7 | 0 | 8.0 | 7.0 |
9 | 0 | 10.0 | 9.0 |
4 | 0 | 5.0 | 4.0 |
0 | 1 | 2.0 | 1.0 |
0 | 0 | 2.0 | 1.0 |
0 | 2 | 3.0 | 2.0 |
3 | 0 | 4.0 | 3.0 |
2 | 0 | 3.0 | 2.0 |
1 | 0 | 2.0 | 1.0 |
6 | 0 | 7.0 | 6.0 |
10 | 0 | 11.0 | 10.0 |
8 | 0 | 9.0 | 8.0 |
nsrc
and nbgnd
are the number of counts in the source and background channel, and Var(x)
is the variance calculated by x. We can see that
Var(sherpa) = 1 + Var(xspec)
so it looks like XSPEC is "dropping" the 0-count channel when calculating the errors (apart from when both channels are 0, when the variance is set to 1.0).
At some level all bets are off when you have 0 counts using Gaussian statistics - XSPEC provides this warning:
***Warning: Chi-square may not be valid due to bins with zero variance
in spectrum number(s): 1
but do we want to agree with what XSPEC does here when using chi2xspecvar
?
EDIT I've come to realize that we really should make sure people don't use chi2xspecvar
when fitting the model to the background (rather than subtracting it), as we don't have a way to replicate what XSPEC does in this case, because XSPEC doesn't actually do this case. But that's a separate issue - #1543
Yes. I realized that this is the change to the chi2 statistics that have been implemented in XSPEC at some point and there was no corresponding update in Sherpa. The original idea was that chi2xspecvar
will correspond to the chi2
in XSPEC. We either remove this statistics or update to match XSPEC. I vote for removing it, as I think chi2datavar
should correspond to the current XSPEC chi2
, but I did not check this yet.
Unfortunately chi2datavar does not match XSPEC either. For the dataset I used here, if I ignore the background then I get the same result as XSPEC with chi2xspecvar, but a rather different statistic is returned with chi2datavar.
On Thu, Apr 20, 2017 at 12:53 PM Aneta Siemiginowska < [email protected]> wrote:
Yes. I realized that this is the change to the chi2 statistics that have been implemented in XSPEC at some point and there was no corresponding update in Sherpa. The original idea was that chi2xspecvar will correspond to the chi2 in XSPEC. We either remove this statistics or update to match XSPEC. I vote for removing it, as I think chi2datavar should correspond to the current XSPEC chi2, but I did not check this yet.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sherpa/sherpa/issues/356#issuecomment-295811663, or mute the thread https://github.com/notifications/unsubscribe-auth/AANtfqtyQkhvg5sJl7cSXNsw0uH_tdWlks5rx411gaJpZM4NCPh6 .
So we are going to keep the chi2xspecvar but make the following two changes right?
- "drop", ie exclude the accumulation of the stat if yraw[ii] == 0
- remove the setting of error to 1 if yraw[ii] == 0 (step 1 guarantees that division by 0 is avoided)
I've added a test in #768 which shows the problem in this ticket - the behavior with source and/or background counts in a bin being 0.
In the following tables, the rows following bgnd
refer to the error on the point [after backgroud subtraction] calculated by that method (xspec
means XSPEC 12.10.1m, the others are Sherpa statistic names). The Sherpa values are calculated using CIAO 4.12 (so should match master branch at the time of writing).
It does look like chi2datavar
matches XSPEC apart from when both source and background values are 0 (and ignoring floating-point differences). I am pretty sure that #474 still gets an error of 0 for chi2datavar
and chi2xspecvar
for the first channel (ie when source=counts=0) - not 100% sure since it's awakward to rebase this PR onto the current master branch so I could have made a mistake in checking this.
source exposure = 1, background exposure = 1
channel | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
source | 0 | 0 | 0 | 1 | 3 | 1 | 1 | 3 | 3 |
bgnd | 0 | 1 | 3 | 0 | 0 | 1 | 3 | 1 | 3 |
xspec | 1 | 1 | sqrt(3) | 1 | sqrt(3) | sqrt(2) | 2 | 2 | sqrt(6) |
chi2datavar | 0 | 1 | sqrt(3) | 1 | sqrt(3) | sqrt(2) | 2 | 2 | sqrt(6) |
chi2xspecvar | sqrt(2) | sqrt(2) | 2 | sqrt(2) | 2 | sqrt(2) | 2 | 2 | sqrt(6) |
chi2gehrels | 2.63895843 | 2.97956408 | 3.47922896 | 2.97956408 | 3.47922896 | 3.28504226 | 3.74416007 | 3.74416007 | 4.15282635 |
source exposure = 1, background exposure = 10
channel | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
source | 0 | 0 | 0 | 1 | 3 | 1 | 1 | 3 | 3 |
bgnd | 0 | 1 | 3 | 0 | 0 | 1 | 3 | 1 | 3 |
xspec | 0.1 | 0.1 | 0.173205078 | 1 | 1.73205078 | 1.0049876 | 1.01488912 | 1.73493516 | 1.74068952 |
chi2datavar | 0 | 0.1 | 0.17320508 | 1 | 1.73205081 | 1.00498756 | 1.01488916 | 1.73493516 | 1.74068952 |
chi2xspecvar | 1.00498756 | 1.00498756 | 1.01488916 | 1.00498756 | 1.73493516 | 1.00498756 | 1.01488916 | 1.73493516 | 1.74068952 |
chi2gehrels | 1.87533232 | 1.8804277 | 1.88898932 | 2.33035873 | 2.94241463 | 2.33446114 | 2.3413631 | 2.94566476 | 2.95113761 |
source exposure = 1, background exposure = 0.1
channel | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
source | 0 | 0 | 0 | 1 | 3 | 1 | 1 | 3 | 3 |
bgnd | 0 | 1 | 3 | 0 | 0 | 1 | 3 | 1 | 3 |
xspec | 1 | 10 | 17.320507 | 1 | 1.73205078 | 10.0498753 | 17.3493519 | 10.1488914 | 17.4068947 |
chi2datavar | 0 | 10 | 17.32050808 | 1 | 1.73205081 | 10.04987562 | 17.34935157 | 10.14889157 | 17.40689519 |
chi2xspecvar | 10.04987562 | 10.04987562 | 17.34935157 | 10.04987562 | 10.14889157 | 10.04987562 | 17.34935157 | 10.14889157 | 17.40689519 |
chi2gehrels | 18.75332321 | 23.30358732 | 29.4241463 | 18.80427696 | 18.88989317 | 23.34461142 | 29.45664757 | 23.41363095 | 29.51137608 |
(this doesn't answer Dan's questions for what to do if the data value is negative).
Looking at the tables @DougBurke just put up, it looks like the requirements for chi2xspecvar
is to be similar to chi2datavar
except when src
and bgnd
are both 0 in which case the stat shall be 1/background_exposure
. Right?
I thin what @anetasie was suggesting - but we'll have to wait until after the deadline for confirmation - is that chi2datavar
should behave like XSPEC and we should get rid of chi2xspecvar
.
If we were to go this route then I think we'd actually want to make chi2xspecvar
just be a synonym for chi2datavar
and deprecate it (so that users who use it don't have to change their code/scripts immediately), although perhaps given that the exsiting chi2xspecvar
results are so different maybe there's a point in saying "hey, you need to be careful here".
Yes, I agree with @DougBurke and to make chi2datavar
behave consistently with xspec, so chi2xspecvar
becomes a synonym of chi2datavar
.
@dtnguyen2 is the update to chi2datavar
so easy?
Updating class Chi2DataVar(Chi2)
will not be easy, but not impossible. Currently, Chi2DataVar
inherits the method calc_stat
from the base class Chi2
which (I think) does not pass in the background data. However, the class WStat
has code to extract/pass the background data so one has to do some gymnastics to make sure not to duplicate code etc...
This was my concern: that the current set up doesn't make this change easy. I wonder if we can add a method to the chi-square classes to combine source and background counts to generate an error. In most cases this would just be normal variance, but then the chi2datavar case could over-ride the behavior.
Doug
On Wed, Apr 8, 2020 at 4:02 PM dtnguyen2 [email protected] wrote:
Updating class Chi2DataVar(Chi2) will not be easy, but not impossible. Currently, Chi2DataVar inherits the method calc_stat from the base class Chi2 which (I think) does not pass in the background data. However, the class WStat has code to extract/pass the background data so one has to do some gymnastics to make sure not to duplicate code etc...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sherpa/sherpa/issues/356#issuecomment-611164402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABW27R5UKXDP2SR63HUB4DRLTJ3ZANCNFSM4DII7B5A .
I wonder if we can add a method to the chi-square classes to combine source and background counts to generate an error. In most cases this would just be normal variance, but then the chi2datavar case could over-ride the behavior.
I note the polarimetry data has a similar requirement - you need more than just the data and the model to calculate the correct statistic (in the polarimetry case: I, Q, and U spectra)
@hamogu - so, for the xspecvar case we have
$e^j_i = f(d^j_i, b^j_i)$
where subscript indicates the channel number and superscript j indicates the j'th dataset (with $e$ being the error and $d$ and $b$ being the source and background counts and $f$ is "a function"). Am I right in thinking that in the polarimetry case you have something like
$e_i = g(d^I, d^Q, d^U, b^I, b^Q, b^U)$
That is, the error for the ith bin is some combination of multiple datasets (here labelled with superscripts $I$, $Q$, $U$), so the three datasets get fitted "as one". Or do you still have three "separate" data sets that get fit somewhat independently?