msprime
msprime copied to clipboard
add tree fixation using newick string and TreeSequence object
This PR will add an option (--ce-from-nwk
) to provide a tree in newick format that will be respected during simulation (hudson model).
The aim is to make sure that the given tree is present and that regular simulation (gene conversion and recombination) is still possible. The main idea is that each segment has an origin set (in the beginning only the id of the leaf), which is passed through and merged with the origins of the other segment at each coalescence event. When a regular event occurs, the algorithm checks if the selected segment is needed later, if so, the event is ignored.
Changes:
The following structural changes will be made:
General
- Add a newick parser that creates a list of coalescence events (
Tuple[float, set, set]
). Each tuple represents the time and two lineages to coalesce.
Segment class
- Add an "origin" set that keeps track of which segments this is created from.
Population class
- Add a method
get_emerged_from_lineage
that returns the indices of the lineages that have emerged from a given one.
Simulator class
- Adapt
alloc_segment
andcopy_segment
to use theorigin
attribute. - Adapt
hudson_simulate
.- Add an if clause to check if the time has exceeded the next fixed coalescent event (
self.coalescent_events[0][0] < self.t
) - Reset the time to the coalescence event time with a possible epsilon as no two events can happen at the same time.
- Add an if clause to check if the time has exceeded the next fixed coalescent event (
- Add
is_blocked_ancestor
method to check if a lineage id is used in a later fixed coalescence event.- If the lineage is required, but another lineage with identical origin exists. the lineage is not considered to be blocked.
- Adapt
common_ancestor_event
- If it's a fixed coalescence event, select all the lineages that emerged from lineage_a and lineage_b, and merge one at random (for each lineage a and lineage b).
- If not, proceed regularly by selecting two random ones.
Example:
The command
py algorithms.py --recombination-rate 0 --gene-conversion-rate 0 1 --sequence-length 10000 10 out --discrete --ce-events "((((A:0.005, B:0.005):0.01, (C:0.005, D:0.005):0.01):0.1,((E:0.005, F:0.005):0.01, (G:0.005, H:0.005):0.01):0.01):0.01, (I:0.005, J:0.005):0.01)"
will produce following tree:
ts.draw_text() | tskit_arg_visualizer |
---|---|
Event though the first coalescence events (at Node 10-14) should happen at the same time (0.005), the events are slightly postponed as no two events can happen at the same time.
Todo / Open Tasks
- Add tests