tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Clarify Migrations

Open awohns opened this issue 4 years ago • 4 comments

I think the documentation of migrations could use a bit more detail here: migration table section of the docs. Specifically, the docs read:

Migrations are performed by individual ancestors, but most likely not by an individual whose genome is tracked as a node (as in a discrete-deme model they are unlikely to be both a migrant and a most recent common ancestor). So, tskit records when a segment of ancestry has moved between populations. and The node column records the ID of the node that was associated with the ancestry segment in question at the time of the migration event.

I think a clearer way of explaining this is that migrations chart how nodes are associated with a population. Specifically, the migration node refers to the child node of the edge where the migration occurred, so a migration from source population x to destination population y at a given time, node, and left/right coordinate means that at the edge (or edges) denoted by the node and coordinates, ancestors younger than the time of the migration on the relevant edges belong to population x and the older ancestors along the relevant edges belong to population y (or at least until an intervening migration occurs). Furthermore, all the older ancestors of the parent node of the edge which exist between the left and right coordinates also belong to population y (until they are affected by an older migration) and all descendants of this edge belong to population x over the left/right coordinates (until they are affected by a more recent migration). Here's an example of when this is important: if you wanted to know which tracts of ancestry (note this does not necessarily correspond to the haplotypes since it doesn't depend on variant sites) in modern samples are the result of a historic migration, we would look at the migration node and the marginal trees existing between the left and right coordinates, and then find the relevant leaf nodes. This will give the "ancestry segments" carried by samples which are the result of migrations in the absence of intervening migrations. Note that these left/right segments do not always correspond to the breakpoints between edges, which I found surprising at first.

If I have all that right, then I think we should clarify a few thing: (1) migrations explain how ancestral nodes how/why ancestral nodes have a population, (2) that the migration node is the child node of the edge where the migration occurred, (3) that, barring multiple migrations on an edge, the child node of the edge belongs to source population and the parent node belongs to the destination population.

If others agree, I'll make a PR with to document this a bit more clearly.

awohns avatar Jan 30 '21 03:01 awohns

I agree with this! (Except, maybe you meant "migrations chart how edges are associated with populations"?)

petrelharp avatar Jan 30 '21 03:01 petrelharp

Great! Yes, you're right, I was focused on nodes because they actually have a population field, but that actually just confuses things because it's really the edges that count. For instance, I just was looking at an example where a node is the parent of multiple edges, so the "birth population" of that node depends on the oldest migration event on any one of these edges. I would guess (but haven't checked) that it's impossible for there to be conflicts for a given node in this regard: i.e. for part of the span where a node exists the oldest migration on a child edge has destination pop a and for another span it's pop b. In that case I don't know how you would assign a population to the node, but I suppose that can't happen.

awohns avatar Jan 30 '21 04:01 awohns

Right, and: nodes have populations, which should agree with the migration records.

I just tried to rephrase things but basically started retyping what you have above. =)

petrelharp avatar Jan 30 '21 04:01 petrelharp

SGTM

jeromekelleher avatar Jan 30 '21 13:01 jeromekelleher

See https://github.com/tskit-dev/tskit/pull/3348#issuecomment-3612174722

I regard the record_migrations option in msprime to be a legacy feature that has been superseded by the additional nodes feature, which provides the same information as nodes explicitly marking migrations and edges, and is much simpler.

hyanwong avatar Dec 04 '25 14:12 hyanwong