stingray icon indicating copy to clipboard operation
stingray copied to clipboard

Stingray Z^2 Search is not Chi-squared Distributed for count rates less than 50/3

Open gabrielb09 opened this issue 3 years ago • 3 comments

I was working on a notebook to explain some of the functionality of the $Z^2$ search to some students of mine when I came across this issue. I wanted to include a quick plot to show that the $Z^2$ search was distributed like $\chi^2$ for pure noise but discovered quite the opposite.

I generated $10^5$ photons uniformly across $10^4$ seconds of simulated observation. I then searched between $10^{-2} -10^1$ Hz with a spacing of $10^{-3}$ Hz. The resulting $Z^2$ amplitudes should have been distributed like $\chi^2$ with 2 degrees of freedom.

Dis_compare.pdf Left: Stingray $Z^2$ amplitude distribution Right: Exact $Z^2$ amplitude distribution

This is the code I used to generate the figure (I know its not exactly a minimum example but I thought it might be useful in the event that I made some error)

import numpy as np
import matplotlib.pyplot as plt
from stingray.pulse.search import z_n_search
from scipy.stats import chi2

rng = np.random.default_rng()

times = np.sort(rng.uniform(0., 1e4, int(1e5)))
freqs = np.arange(1e-2, 1e1, 1e-3)

f, z_sting = z_n_search(times, freqs, nbin=100, nharm=1)

phase = (times[None,:]*freqs[:,None])%1.
z_exact = (2/len(times))*(np.sum(np.cos(2*np.pi*phase), axis = 1)**2 + np.sum(np.sin(2*np.pi*phase), axis = 1)**2)

x = np.linspace(np.min(zs), np.max(zs), 100)
pdf = chi2.pdf(x, 2)

fig, axs = plt.subplots(1,2, figsize = (10,5))

axs[0].hist(z_sting, bins = 100, density = True)
axs[0].plot(x, chi2.pdf(x, 2))

axs[1].hist(z_exact, bins = 100, density = True)
axs[1].plot(x, chi2.pdf(x, 2))

I continued to investigate the problem and found that adjusting with the frequencies I searched over or the number of bins in the stingray $Z^2$ test had no impact on the distribution. Adjusting the simulated count-rate, however, could cause the stingray $Z^2$ test to "converge" to a $\chi^2$ distribution. This wasn't a smooth process but rather a complete switch at a count rate of 16.666 counts per second, anything I generated with rate $\leq$ 16.666 cnts/s was not $\chi^2$. 16.667 cnts/s looks $\chi^2$ while 16.666 does not.

I will leave you with some notes about my configuration. I am running Stingray version 1.0 on Mac Os 12.3.1 (M1 Arm processor) I do not have pyfftw installed

gabrielb09 avatar Jul 25 '22 23:07 gabrielb09

Hi @gabrielb09, I think I know what is going on, but what you say about the count rate is WEIRD. z_n_search has a keyword, segment_size. This was meant to make the average of Z searches over small intervals. In practice, it's non-standard and people are very confused by it. segment_size should be set to np.inf by default, and indeed, if you set it to anything larger than 10000 your example works. However, the thing about the count rate does not fit in this description. Are you fixing the count rate and calculating the number of events, or calculating it from the length of the observation and the number of events? In the second case, the segment_size description works, otherwise we need to find another explanation.

matteobachetti avatar Jul 26 '22 14:07 matteobachetti

Hey @matteobachetti thank you for getting back so quickly.

I fix the number of counts and the count rate so the observation length is left to vary.

rate = 1.666
N = int(1e5)
times = np.sort(rng.uniform(0., N/rate, N))

Since my total number of counts was always $10^5$ my 'magical' rate of 16.666 corresponds to an observation of ~6000 seconds.

If I change my total counts the rate does in fact change such that the length is always ~6000 seconds (i.e. for a total of $10^4$ counts I get a 'magical' rate of 1.666). This would mean that the break occurs for observations longer than 6000 seconds. I just checked and, in my version of stingray (1.0), the default segment size in the Z^2 search is only 5000 seconds. Im a little puzzled as to why the break in statistics would occur at observation length of ~6000 seconds rather than 5000 but this does seem to be the source of this behavior.

gabrielb09 avatar Jul 26 '22 15:07 gabrielb09

In stingray/pulse/search.py, line 38, we do this:

        if len(ts) < 1 or ts[-1] - ts[0] < 0.2 * segment_size:
            continue

If you have more than one interval, the last interval is only considered when it's long at least, 20% of a segment. Here's where your 6000 comes from :). Below 6000, only one interval is considered. Above 6000, you have two intervals. Anyways, I don't think it makes sense having that keyword anymore. Feel free to set it to np.inf. In future versions of the code it will be removed altogether, or better documented!

matteobachetti avatar Jul 26 '22 16:07 matteobachetti

Should be solved in #670. Thanks again @gabrielb09 for reporting!

matteobachetti avatar Oct 11 '22 12:10 matteobachetti