tskit
tskit copied to clipboard
Feat : add reciprocal monophyly function to Tree class
Description
This PR adds a method to the Tree class in the Python API : is_recip_monophylectic().
It tests the current tree for reciprocal monophyly, a population genomics phenomenon as described here.
The key algorithm can be summarized like this : if each population's MRCA's children are the same as the all the population's leaves, all populations are reciprocally monophyletic.
PR Checklist:
- [ ] Tests that fully cover new/changed functionality (tried, but I was faced with
ModuleNotFoundError: No module named '_tskit'when importingtskitfrom my cloned fork) - [x] Documentation including tutorial content if appropriate. (code comments done, should I add to the official docs ?)
Also, this is only implemented in Python, not C, so someone needs to do that as I do not know C.
Codecov Report
Attention: Patch coverage is 6.66667% with 14 lines in your changes missing coverage. Please review.
Project coverage is 90.45%. Comparing base (
beafeba) to head (3bafdbf). Report is 150 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| python/tskit/trees.py | 6.66% | 14 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #2974 +/- ##
==========================================
+ Coverage 89.62% 90.45% +0.82%
==========================================
Files 29 26 -3
Lines 30170 21619 -8551
Branches 5867 4372 -1495
==========================================
- Hits 27041 19555 -7486
+ Misses 1790 1160 -630
+ Partials 1339 904 -435
| Flag | Coverage Δ | |
|---|---|---|
| c-tests | 86.20% <ø> (ø) |
|
| lwt-tests | ? |
|
| python-c-tests | ? |
|
| python-tests | 98.82% <6.66%> (-0.20%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files with missing lines | Coverage Δ | |
|---|---|---|
| python/tskit/trees.py | 98.20% <6.66%> (-0.59%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
ModuleNotFoundError: No module named '_tskit' have you run make in the python directory?
See the docs here for details on building python extension module: https://tskit.dev/tskit/docs/stable/development.html#id1
Here's a quick sketch of another approach to doing this @Silloky:
import msprime
demography = msprime.Demography()
demography.add_population(name="A", initial_size=1_000)
demography.add_population(name="B", initial_size=1_000)
demography.add_population(name="C", initial_size=10_000)
demography.add_population_split(time=10000, derived=["A", "B"], ancestral="C")
ts = msprime.sim_ancestry(samples={"A": 2, "B": 3}, demography=demography, random_seed=12)
print(ts)
print(ts.draw_text())
for pop_id in range(ts.num_populations):
pop_samples = ts.samples(population=pop_id)
if len(pop_samples) == 0:
continue
for tree in ts.trees(tracked_samples=pop_samples):
mrca = tree.mrca(*pop_samples) # Inefficient
monophyletic = len(pop_samples) == tree.num_tracked_samples(mrca)
print(pop_id, mrca, monophyletic)
Here, we're computing a true/false value for whether a population (or more generally any sample subset) is monophyletic in each tree. We do this using the num_tracked_samples, which is efficiently computed for each node in the C layer. The only inefficient bit here is how we're computing the MRCA of all the samples (but in most cases would be fine).
I think a good API for this would be a TreeSequence method that looked something like this:
def is_monophyletic(self, samples):
"""
Returns a boolean array of length ts.num_trees which is True if the corresponding tree is monophyletic
for the given set of samples. That is, if ``u`` is the MRCA of all the specified samples in a given tree, no
other samples are present in the subtree rooted at u.
"""
# Code that looks like the above.
How does this look to you? If the API matches your needs, then we can go ahead with implementation and testing, and then think about how we can do the MRCA more efficiently (if we need to)?
Hi, thank you very much for your answer. Indeed, this looks more promising and definitely more efficient. I don't really have time now to look into it sadly... but I will try to implement your suggestions in August. Have a nice day and a nice summer !
Hi @Silloky! Are you still planning to work on this?
Hi @benjeffery, Yes, I am ! I do realise it's been a while, but I've been quite busy... I plan to implement it in the next few weeks.
Closing this due to inactivity - please feel free to reopen if you'd like to pick it back up @Silloky
Sure, haven't had times with all my exams... I'll work on this next summer.