msprime
msprime copied to clipboard
segmentation fault when trying to run msprime.log_arg_likelihood()
import tskit
import msprime
import random
random.seed(6)
nreps = 20
ne = [10, 1e2, 1e3, 1e4, 3e4, 5e4, 7e4, 1e5]
for i in range(nreps):
ifile = "slim_msp_rep_" + str(i+1) + ".trees"
ts = tskit.load(ifile)
for j in range(len(ne)):
print(msprime.log_arg_likelihood(ts, recombination_rate = 0, Ne = ne[j]))
This code results in a Segmentation fault more often than not. The tree sequences are from a single-locus SLiM model without recombination, followed by recapitation of the deep past with msprime. I use a random seed of 6 with SLiM and msprime. I'm not sure how to reproduce the error as the code runs fine for the same tree sequence on occasion.
Thanks for the bug report @savitakartik.
It looks like this will fail for a specific tree sequence file that you have stored - could you narrow it down to one file that definitely does result in a segfault, at least some of the time, please?
The file shouldn't be too big, so hopefully you can just attach it here.
This is speculation without seeing the details, but since part of the tree comes from SLiM I'm guessing this will have something to do with simultaneous mergers. I wouldn't be surprised if those could cause the way the likelihood calculation traverses nodes and edges to go haywire. The likelihood is only really valid for tree sequences which are output by the Hudson coalescent with recombination algorithm; for anything else we should expect things to break. Even if the code runs, the output won't be very meaningful.
That's true - still, we shouldn't segfault, so I'd like to get to the bottom of this.
For example, this tree sequence file segfaults for Ne=10, 1000, 10000 slim_msp_rep_7.zip
Hmm, I can't reproduce. I've run this code:
import msprime
import tskit
import tqdm
ts = tskit.load("slim_msp_rep_7.trees")
print(ts)
for _ in tqdm.tqdm(range(1000)):
ll = msprime.log_arg_likelihood(ts, recombination_rate = 0, Ne=1000)
print(ll)
Are you running this on the BMRC cluster? What version of msprime/Python etc?
Sorry, wrong file - could you try this file but without the tqdm module, although I don't know why it would make a difference. I get a segmentation fault with:
import msprime
import tskit
ts = tskit.load("slim_rep_7.trees")
print(ts)
for i in range(1000):
ll = msprime.log_arg_likelihood(ts, recombination_rate = 0, Ne=1000)
print(i, ll)
but when I import tqdm (e.g. your code above), it works fine.
I'm using Python 3.10.5, msprime 1.2.0, tskit 0.5.2 on the BMRC cluster. slim_rep_7.zip
Great, just reproduced:
$ gdb python3
(gdb) run segfault.py
Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007ffff73d5274 in msp_log_likelihood_arg (ts=ts@entry=0x7fffffffd590, r=0, Ne=1000,
r_lik=r_lik@entry=0x7fffffffd588) at lib/likelihood.c:179
179 edge = last_parent_edge[edges->child[edge]];
(gdb) bt
#0 0x00007ffff73d5274 in msp_log_likelihood_arg (ts=ts@entry=0x7fffffffd590, r=0, Ne=1000,
r_lik=r_lik@entry=0x7fffffffd588) at lib/likelihood.c:179
#1 0x00007ffff73be4db in msprime_log_likelihood_arg (self=<optimised out>,
args=<optimised out>, kwds=<optimised out>) at msprime/_msprimemodule.c:3198
#2 0x00000000005f6929 in PyCFunction_Call ()
#3 0x00000000005f74f6 in _PyObject_MakeTpCall ()
#4 0x000000000057164c in _PyEval_EvalFrameDefault ()
#5 0x0000000000569dba in _PyEval_EvalCodeWithName ()
#6 0x00000000005f6eb3 in _PyFunction_Vectorcall ()
#7 0x000000000056cc1f in _PyEval_EvalFrameDefault ()
#8 0x0000000000569dba in _PyEval_EvalCodeWithName ()
#9 0x00000000006902a7 in PyEval_EvalCode ()
#10 0x000000000067f951 in ?? ()
#11 0x000000000067f9cf in ?? ()
#12 0x000000000067fa71 in ?? ()
#13 0x0000000000681b97 in PyRun_SimpleFileExFlags ()
#14 0x00000000006b9d32 in Py_RunMain ()
#15 0x00000000006ba0bd in Py_BytesMain ()
#16 0x00007ffff7dd7083 in __libc_start_main (main=0x4efd60 <main>, argc=2,
argv=0x7fffffffdea8, init=<optimised out>, fini=<optimised out>,
rtld_fini=<optimised out>, stack_end=0x7fffffffde98) at ../csu/libc-start.c:308
#17 0x00000000005fc5fe in _start ()
so the segfault happens here
Digging in:
(gdb) frame 0
#0 0x00007ffff73d5274 in msp_log_likelihood_arg (ts=ts@entry=0x7fffffffd590, r=0, Ne=1000, r_lik=r_lik@entry=0x7fffffffd588) at lib/likelihood.c:179
179 edge = last_parent_edge[edges->child[edge]];
(gdb) print edge
$1 = 36521
(gdb) print edges->child[edge]
$2 = 246130
(gdb) print last_parent_edge[246130]
Cannot access memory at address 0x2853448
so, looks like we've run over in the last_parent_edge array. But hmm, there's only 36K nodes, so perhaps the child[edge] above was unset?
(gdb) print nodes->num_rows
$3 = 36522
$4 = 36522
(gdb) print edges->num_rows
$5 = 36521
(gdb) print edge
$6 = 36521
Ah, no, we've overrun the edge table. How did edge
become equal to the number of rows in the table? Because we increment edge
here and assume that it corresponds to a valid index.
So, we just need to check this, I think. It probably can't happen if we have binary trees.
@JereKoskela - should we fix this corner case and return a value, or should we check for binary trees in the input and return an error when the trees don't conform to our expectations?
I think it would be cleanest to return an error when the tree isn't binary, since any number we return won't really mean much. Is there an easy, existing way to check that @jeromekelleher, or do you just mean adding a check to when we increment edge
?
Something like this in the Python code is probably best:
for tree in ts.trees():
if np.any(tree.num_children_array > 2):
raise ValueError(...)
Although should probably check for unary nodes etc as well.
@jeromekelleher, any idea why python3 -m pytest
would insist that Tree
object has no attribute num_children_array
?
Running git submodule update --init --recursive
, then python -m pip install -r requirements/development.txt
and make
in the project root all complete without errors, and I'm parallel with the latest version of the main msprime repo.
@JereKoskela what version of tskit are you using? Pip won't upgrade it unless asked and the dev requirements might need a pin.
python -m pip install --upgrade tskit
Upgrading tskit did the job. I had naively assumed reinstalling requirements would do that. Thanks @benjeffery!
Sounds like we need a pin. Will raise an issue!
Reopening this until we update the CHANGELOG