msprime
                                
                                 msprime copied to clipboard
                                
                                    msprime copied to clipboard
                            
                            
                            
                        add tree fixation using newick string and TreeSequence object
This PR will add an option (--ce-from-nwk) to provide a tree in newick format that will be respected during simulation (hudson model).
The aim is to make sure that the given tree is present and that regular simulation (gene conversion and recombination) is still possible. The main idea is that each segment has an origin set (in the beginning only the id of the leaf), which is passed through and merged with the origins of the other segment at each coalescence event. When a regular event occurs, the algorithm checks if the selected segment is needed later, if so, the event is ignored.
Changes:
The following structural changes will be made:
General
- Add a newick parser that creates a list of coalescence events (Tuple[float, set, set]). Each tuple represents the time and two lineages to coalesce.
Segment class
- Add an "origin" set that keeps track of which segments this is created from.
Population class
- Add a method get_emerged_from_lineagethat returns the indices of the lineages that have emerged from a given one.
Simulator class
- Adapt alloc_segmentandcopy_segmentto use theoriginattribute.
- Adapt hudson_simulate.- Add an if clause to check if the time has exceeded the next fixed coalescent event (self.coalescent_events[0][0] < self.t)
- Reset the time to the coalescence event time with a possible epsilon as no two events can happen at the same time.
 
- Add an if clause to check if the time has exceeded the next fixed coalescent event (
- Add is_blocked_ancestormethod to check if a lineage id is used in a later fixed coalescence event.- If the lineage is required, but another lineage with identical origin exists. the lineage is not considered to be blocked.
 
- Adapt common_ancestor_event- If it's a fixed coalescence event, select all the lineages that emerged from lineage_a and lineage_b, and merge one at random (for each lineage a and lineage b).
- If not, proceed regularly by selecting two random ones.
 
Example:
The command
py algorithms.py --recombination-rate 0 --gene-conversion-rate 0 1 --sequence-length 10000 10 out --discrete --ce-events "((((A:0.005, B:0.005):0.01, (C:0.005, D:0.005):0.01):0.1,((E:0.005, F:0.005):0.01, (G:0.005, H:0.005):0.01):0.01):0.01, (I:0.005, J:0.005):0.01)" 
will produce following tree:
| ts.draw_text() | tskit_arg_visualizer | 
|---|---|
Event though the first coalescence events (at Node 10-14) should happen at the same time (0.005), the events are slightly postponed as no two events can happen at the same time.
Todo / Open Tasks
- Add tests
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 91.30%. Comparing base (
0b7e04d) to head (9fd4538).
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2276   +/-   ##
=======================================
  Coverage   91.30%   91.30%           
=======================================
  Files          20       20           
  Lines       11873    11873           
  Branches     2421     2421           
=======================================
  Hits        10841    10841           
  Misses        563      563           
  Partials      469      469           
| Flag | Coverage Δ | |
|---|---|---|
| C | 91.30% <ø> (ø) | |
| python | 98.71% <ø> (ø) | 
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This looks very cool @not-a-feature, thanks! I'll need a bit of time to digest, and I think it would probably help to have a call about it. @fbaumdicker - I assume you're involved here somewhere?
Yes, I am. @not-a-feature and I are incorporating a) mutation models representing gene presence-absence evolution, b) fixed clonal backbone phylogenies (this PR), and c) simulating gene transfer instead of conversion. A call would be great. Let's find a time in Slack.
As discussed loading the common ancestor events from a TreeSequence file is now supported.
Currently it is only tested on well defined trees without gene conversion or recombination.
Either use the --ce-from-ts option to provide a file path or Simulator(..., coalescent_events_ts)  and provide a TreeSequnce object directly.
It raises an error if it is used in combination with  --ce-from-nwk.
I was thinking more about providing this information as part of the initial state, rather than as something extrinsic. So, we provide the "backbone" of the simulation pre-done as some nodes and edges, and (importantly) these edges are not subsequently altered. Is this possible, or does that break the model?