tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Mutation's "inherited" state

Open jeromekelleher opened this issue 3 years ago • 10 comments
trafficstars

Getting the state that a mutation inherited is fiddly right now, and has been annoying me quite bit recently. We should add a method to the Python object at least to get it, but it would also be nice to have access to it in numpy for reasoning efficiently about reversions and so on.

Does inherited_state feel like the right term here?

The numpy stuff would depend on us making some decisions about vectorised access to the string columns which I guess I should open a seperate issue on.

tables = ts.tables
assert np.all(tables.mutations.derived_state_offset == np.arange(ts.num_mutations + 1))
derived_state = tables.mutations.derived_state.view("S1")
assert np.all(tables.sites.ancestral_state_offset == np.arange(ts.num_sites + 1))
ancestral_state = tables.sites.ancestral_state.view("S1")
# Compute the "inherited state" for each mutation
inherited_state = ancestral_state[ts.mutations_site]
mutations_with_parent = ts.mutations_parent != -1
parent = ts.mutations_parent[mutations_with_parent]
assert np.all(parent >= 0)
inherited_state[mutations_with_parent] = derived_state[parent]

# Check
for mut in ts.mutations():
    state0 = ts.site(mut.site).ancestral_state
    if mut.parent != -1:
        state0 = ts.mutation(mut.parent).derived_state
    assert state0 == inherited_state[mut.id].decode()

jeromekelleher avatar Nov 08 '22 20:11 jeromekelleher

inherited_state sounds good to me, only other thing I could come up with parent_state but I think that could get confused with the parent node state.

benjeffery avatar Nov 08 '22 23:11 benjeffery

I have wanted this a number of times too, so +1 from me. The name seems reasonable.

hyanwong avatar Nov 15 '22 22:11 hyanwong

Presumably we would need to make sure that compute_mutation_parents() has been run before calculating this.

hyanwong avatar Mar 27 '23 14:03 hyanwong

Yes, it's a bug really that we don't fully validate the mutation parents on tree sequence creation.

jeromekelleher avatar Mar 27 '23 20:03 jeromekelleher

Closing for inactivity and labelling "future", please re-open if you plan to work on this.

benjeffery avatar Jun 09 '25 14:06 benjeffery

I definitely want to implement this once the mutation parents bug has been fixed, so I'm assigning it to me, and reopening. I'll at least implement the non-array version.

hyanwong avatar Jun 09 '25 17:06 hyanwong

Note this is straightforwardly added once #3228 as a numpy array mutations_inherited_state, computed like this:

 def compute_inherited_state(ts):
    derived_state = ts.mutations_derived_state
    ancestral_state = ts.sites_ancestral_state
    inherited_state = ancestral_state[ts.mutations_site]
    mutations_with_parent = ts.mutations_parent != -1
    parent = ts.mutations_parent[mutations_with_parent]
    inherited_state[mutations_with_parent] = derived_state[parent]
    return inherited_state

as this is a derived array, I guess it's debateable whether we should cache it. I think probably not, so we just do something like

@property
def mutations_inherited_state(self):
     return compute_inherited_state(self)

jeromekelleher avatar Jun 25 '25 14:06 jeromekelleher

I assume I'm not thinking about this properly, butt I'm not totally sure I follow how that works for a bi-allelic site? If there is only 1 mutation at a site, isn't the parent -1?

molpopgen avatar Jun 25 '25 15:06 molpopgen

Yep - and their inherited state is the site's ancestral state. This is really for complicated stuff like SARS-CoV-2 where we get up to 30K mutations per site.

jeromekelleher avatar Jun 25 '25 16:06 jeromekelleher

Oh -- I get it, it is the state inherited by the node where a mutation first lands.

molpopgen avatar Jun 25 '25 22:06 molpopgen

Now that https://github.com/tskit-dev/tskit/pull/3212#event-19538220537 and https://github.com/tskit-dev/tskit/pull/3228 are merged, we can finally add this, I think. It would be really useful.

I think we should probably have ts.mutation(N).inherited_state() too, should we?

hyanwong avatar Sep 06 '25 09:09 hyanwong

Yes - but let's get some other projects shipped first

jeromekelleher avatar Sep 06 '25 13:09 jeromekelleher