tskit
tskit copied to clipboard
add watterson's theta
this would be simple, as watterson's theta is
ts.segregating_sites()/sum([1/i for i in range(1, ts.num_samples])
PR welcome!
Should it be implemented as TreeSequence.wattersons_theta()
?
def wattersons_theta(self):
return(self.segregating_sites() / np.sum([1/i for i in np.arange(1, self.num_samples)]))
Should there be additional arguments sample_sets=None
and windows=None
like TreeSequence.diversity()
?
Probably, I'm not sure without looking things up @szhan. Here's some other implementations to check against:
- https://scikit-allel.readthedocs.io/en/stable/stats/diversity.html#allel.watterson_theta
- http://molpopgen.github.io/pylibseq/docs/_build/html/pages/summstats.html#libsequence.thetaw
There's a slight caveat here: the denominator only holds if all samples have the same birth time (even when all of the other assumptions are met).
Upon some further reflection: this caveat is no worse than applying the Tajma's D stat to a sample set of heterogeneous ages. So feel free to ignore me!
This should go in the same place as ts.segregating_sites
, and have all the args that segregating_sites( )
does - it's just a python-only wrapper around segregating_sites
. I think that's actually in trees.py
as it's a method of a TreeSequence; stats.py
is an old way of doing things.
Your proposal is good - the main thing is writing some tests - just see how Tajima's D or some other python-only statistic is tested.
I think that the tests for ts.Wattersons_theta()
should be similar to the tests implemented for ts.segregating_sites()
or another statistic that has the site, node, and branch modes.