`ts.pair_coalescence_rates` can give large or infinite rates
Here's an example giving a maximum value of infinity in the returned matrix:
L = 10e6 # 10 Mb
params = {"sequence_length": L, "population_size":1e4, "recombination_rate": 1e-8, "random_seed": 6}
sweep_model = msprime.SweepGenicSelection(
position=L/2, start_frequency=0.0001, end_frequency=0.9999, s=0.25, dt=1e-6)
ts = msprime.sim_ancestry(20, model=[sweep_model, msprime.StandardCoalescent()], **params)
time_intervals = np.logspace(0, np.log10(ts.max_time), 20)
# time intervals need to go from 0 -> infinity
time_intervals = np.concatenate(([0], time_intervals, [np.inf]))
genome_intervals = np.linspace(0, L, 21)
rates = ts.pair_coalescence_rates(time_windows=time_intervals, windows=genome_intervals)
np.nanmax(rates) # Gives np.inf
If you swap the random_seed to 3, you get a maximum rate of 134217728.0, which also seems wrong. These large rates are all in the last timeslice. Presumably we are hitting a division-by-zero or division-by-epsilon error.
As discussed previously @nspope. I assigned you to this: I hope that's OK.
The issue is that the last time bin starts with the oldest node in the ts. The last bin has a single node in this case, and the corresponding rate estimator is effectively 1/(E[pair time | events > epoch start] - epoch start) , which blows up as only FP error keeps the denominator away from 0. This should be easy enough to catch in practice.
Changing time_intervals = np.logspace(0, np.log10(ts.max_time), 20) to time_intervals = np.logspace(0, np.log10(ts.max_time + 1), 20) gives something much more reasonable. Starting the last epoch at a fixed time just before the final node gives rate estimates that are bogus (e.g. totally dependent on the choice of fixed timepoint), but at least numerically stable.