msprime icon indicating copy to clipboard operation
msprime copied to clipboard

add tree fixation using newick string and TreeSequence object

Open not-a-feature opened this issue 2 months ago • 5 comments

This PR will add an option (--ce-from-nwk) to provide a tree in newick format that will be respected during simulation (hudson model).

The aim is to make sure that the given tree is present and that regular simulation (gene conversion and recombination) is still possible. The main idea is that each segment has an origin set (in the beginning only the id of the leaf), which is passed through and merged with the origins of the other segment at each coalescence event. When a regular event occurs, the algorithm checks if the selected segment is needed later, if so, the event is ignored.

Changes:

The following structural changes will be made:

General

  • Add a newick parser that creates a list of coalescence events (Tuple[float, set, set]). Each tuple represents the time and two lineages to coalesce.

Segment class

  • Add an "origin" set that keeps track of which segments this is created from.

Population class

  • Add a method get_emerged_from_lineage that returns the indices of the lineages that have emerged from a given one.

Simulator class

  • Adapt alloc_segment and copy_segment to use the origin attribute.
  • Adapt hudson_simulate.
    • Add an if clause to check if the time has exceeded the next fixed coalescent event (self.coalescent_events[0][0] < self.t)
    • Reset the time to the coalescence event time with a possible epsilon as no two events can happen at the same time.
  • Add is_blocked_ancestor method to check if a lineage id is used in a later fixed coalescence event.
    • If the lineage is required, but another lineage with identical origin exists. the lineage is not considered to be blocked.
  • Adapt common_ancestor_event
    • If it's a fixed coalescence event, select all the lineages that emerged from lineage_a and lineage_b, and merge one at random (for each lineage a and lineage b).
    • If not, proceed regularly by selecting two random ones.

Example:

The command

py algorithms.py --recombination-rate 0 --gene-conversion-rate 0 1 --sequence-length 10000 10 out --discrete --ce-events "((((A:0.005, B:0.005):0.01, (C:0.005, D:0.005):0.01):0.1,((E:0.005, F:0.005):0.01, (G:0.005, H:0.005):0.01):0.01):0.01, (I:0.005, J:0.005):0.01)" 

will produce following tree:

ts.draw_text() tskit_arg_visualizer

Event though the first coalescence events (at Node 10-14) should happen at the same time (0.005), the events are slightly postponed as no two events can happen at the same time.

Todo / Open Tasks

  • Add tests

not-a-feature avatar Apr 13 '24 15:04 not-a-feature