stingray icon indicating copy to clipboard operation
stingray copied to clipboard

GTIs and Rebinning

Open pier-astro opened this issue 5 months ago • 0 comments

Description of the Issue

When working with light curves that have very fragmented GTIs (e.g. NuSTAR), rebin() method on a Lightcurve object can be overly restrictive and discard significant amounts of data. Of course this behavior is required when doing power spectra and most timing analysis, but in some cases like further GTI filtering for spectra extraction, it might decimate too much of the light curve.

Current behavior: Any new bin with GTI segment shorter than the new bin size is completely excluded from the rebinned light curve.

Inconsistency: Anyhow it seems ther is a discrepancy in how stingray handles GTIs. Correct me if I am wrong but from the source code and a rapid test it seems that make_lightcurve() creates continuous time grids with complete coverage, while rebin() completely discards short GTI segments. This means events.to_lc(dt=100).rebin(1000) can give drastically different results than events.to_lc(dt=1000), even though it should be possible to make them equivalent. Furthermore, if the light curve bin from the events has partial coverage this is not taken into account when building the light curve the first time. Am I right?

Demonstration

The code below creates a simple example with fragmented GTIs and shows how data loss increases as the rebinning factor grows, eventually leading to complete failure when all GTI segments become shorter than the new bin size.

import numpy as np
import stingray

dt = 100
# Create a simple dummy light curve
time = np.arange(dt/2, 10000-dt/2, 100)  # 0 to 10000 seconds, 100s bins
counts = np.random.poisson(50, len(time))  # Poisson counts around 50

# Create fragmented GTIs - realistic for X-ray observations
gti = np.array([
    [0, 1400],
    [1600, 1700],
    [1900, 2000],
    [2100, 3900],
    [4000, 4900],
    [5000, 5900],
    [6000, 7100],
    [8500, 9900]
])

# Filter time and counts to only include data within GTI segments
mask = np.zeros(len(time), dtype=bool)
for gti_start, gti_end in gti:
    mask |= (time - dt/2 >= gti_start) & (time + dt/2 <= gti_end)

time = time[mask]
counts = counts[mask]

print("=== GTI REBINNING DEMONSTRATION ===")
print(f"Original light curve: {len(time)} bins, {time[1]-time[0]} seconds each")
print(f"Total time span: {time[-1] - time[0] + dt} seconds")
print(f"Original GTI segments: {len(gti)}")
for i, g in enumerate(gti):
    print(f"  GTI {i+1}: {g[1]-g[0]} seconds")

# Create Stingray light curve
lc = stingray.Lightcurve(time, counts, gti=gti, dt=dt)
print(f"\nStingray light curve created: {len(lc.time)} bins")

# Try rebinning to different sizes
print("\n=== REBINNING TESTS ===")

rebin_factors = [5, 10, 20]  # 500s, 1000s, 2000s bins
for factor in rebin_factors:
    new_dt = factor * 100
    try:
        lc_rebinned = lc.rebin(new_dt)
        expected_bins = len(time) // factor
        actual_bins = len(lc_rebinned.time)
        
        print(f"\nRebin to {new_dt}s bins (factor {factor}):")
        print(f"  Expected bins: ~{expected_bins}")
        print(f"  Actual bins: {actual_bins}")
        print(f"  Data loss: {expected_bins - actual_bins} bins ({100*(expected_bins-actual_bins)/expected_bins:.1f}%)")
        
        # Show which GTIs survived rebinning (from the rebinned light curve)
        print(f"  GTIs that survived rebinning:")
        for i, g in enumerate(lc_rebinned.gti):
            orig_duration = gti[i][1] - gti[i][0] if i < len(gti) else "N/A"
            print(f"    GTI {i+1}: {g[1]-g[0]:.0f}s (original: {orig_duration}s)")
            
        # Show which original GTIs were discarded
        discarded = []
        for i, orig_g in enumerate(lc.gti):
            orig_duration = orig_g[1] - orig_g[0]
            if orig_duration < new_dt:
                discarded.append(f"GTI {i+1} ({orig_duration}s)")
        
        if discarded:
            print(f"  Discarded GTIs (shorter than {new_dt}s): {', '.join(discarded)}")
        else:
            print(f"  No GTIs discarded")
            
    except ValueError as e:
        print(f"\nRebin to {new_dt}s bins: FAILED - {e}")



import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))

# Plot original data
t0 = time[0]  # Reference time
plt.errorbar(time - t0, counts, xerr=dt/2, fmt='o', markersize=3, alpha=0.7, color='blue', label='Original data (100s bins)')

# Plot GTIs as background spans
for i, g in enumerate(lc.gti):
    plt.axvspan(g[0] - t0, g[1] - t0, alpha=0.3, color='green', 
                label='GTI' if i == 0 else "")

# Plot rebinned data for different factors
colors = ['red', 'orange', 'purple']
markers = ['s', '^', 'D']

for i, factor in enumerate(rebin_factors):
    new_dt = factor * 100
    try:
        lc_rebinned = lc.rebin(new_dt)
        if len(lc_rebinned.time) > 0:
            plt.errorbar(lc_rebinned.time - t0, lc_rebinned.counts, 
                        yerr=lc_rebinned.counts_err,
                        xerr=lc_rebinned.dt/2,  # Show bin width
                        fmt=markers[i], markersize=8, linewidth=1,
                        color=colors[i], 
                        label=f'Rebinned {new_dt}s ({len(lc_rebinned.time)} bins)')
    except ValueError:
        # Show that rebinning failed
        plt.text(5000, 30 - i*5, f'Rebinning to {new_dt}s FAILED', 
                color=colors[i], fontweight='bold', fontsize=12)

plt.xlabel('Time (s)')
plt.ylabel('Counts')
plt.title('GTI Rebinning Problem Demonstration')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
=== GTI REBINNING DEMONSTRATION ===
Original light curve: 77 bins, 100.0 seconds each
Total time span: 9900.0 seconds
Original GTI segments: 8
  GTI 1: 1400 seconds
  GTI 2: 100 seconds
  GTI 3: 100 seconds
  GTI 4: 1800 seconds
  GTI 5: 900 seconds
  GTI 6: 900 seconds
  GTI 7: 1100 seconds
  GTI 8: 1400 seconds

Stingray light curve created: 77 bins

=== REBINNING TESTS ===

Rebin to 500s bins (factor 5):
  Expected bins: ~15
  Actual bins: 11
  Data loss: 4 bins (26.7%)
  GTIs that survived rebinning:
    GTI 1: 1400s (original: 1400s)
    GTI 2: 1800s (original: 100s)
    GTI 3: 900s (original: 100s)
    GTI 4: 900s (original: 1800s)
    GTI 5: 1100s (original: 900s)
    GTI 6: 1400s (original: 900s)
  Discarded GTIs (shorter than 500s): GTI 2 (100s), GTI 3 (100s)

Rebin to 1000s bins (factor 10):
  Expected bins: ~7
  Actual bins: 4
  Data loss: 3 bins (42.9%)
  GTIs that survived rebinning:
    GTI 1: 1400s (original: 1400s)
    GTI 2: 1800s (original: 100s)
    GTI 3: 1100s (original: 100s)
    GTI 4: 1400s (original: 1800s)
  Discarded GTIs (shorter than 1000s): GTI 2 (100s), GTI 3 (100s), GTI 5 (900s), GTI 6 (900s)

Rebin to 2000s bins: FAILED - No valid GTIs after rebin.
Image

Your suggestions for the Fix

Possible Solution: Add a parameter similar to fracexp_limit for rebinning Lightcurve objects. With a default value of 1.0, the output would remain the same as current behavior (safe), with GTIs redefined accordingly. When set to <1.0, it would allow building new bins with fractional GTI coverage, excluding only bins whose exposure falls below this threshold. In these cases (i.e., when threshold < 1.0), the GTIs could remain those of the original light curve.

The same criterion could be applied to events.to_lc() for consistency. (Btw in this case apply_gtis() already fixes the problem of neglecting the partial coverage of the bin, but then we fall back on the "issue" itself. With very short GTIs gaps, we throw away the whole bin.)

What do you think? Does this make sense, or am I missing something?

-Pierpaolo

pier-astro avatar Jul 21 '25 17:07 pier-astro