tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Clarify mutation order in Site() object

Open hyanwong opened this issue 1 year ago • 5 comments

This is useful to know without having to dive into the order requirements docs. I'm often looking this up to find the inherited state at a node.

hyanwong avatar Dec 05 '24 14:12 hyanwong

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 89.80%. Comparing base (a2a3401) to head (71a34ee). :warning: Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3067   +/-   ##
=======================================
  Coverage   89.80%   89.80%           
=======================================
  Files          29       29           
  Lines       31026    31026           
  Branches     5679     5679           
=======================================
  Hits        27863    27863           
  Misses       1777     1777           
  Partials     1386     1386           
Flag Coverage Δ
c-tests 86.85% <ø> (ø)
lwt-tests 80.38% <ø> (ø)
python-c-tests 87.05% <ø> (ø)
python-tests 98.84% <ø> (ø)
python-tests-no-jit 33.60% <ø> (ø)
python-tests-numpy1 50.18% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
python/tskit/trees.py 98.88% <ø> (ø)
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Dec 05 '24 14:12 codecov[bot]

Note that there are are loopholes in the mutation data requirements which means that it's possible for this to not actually be true (which is a bug, but not one we can easily resolve now)

jeromekelleher avatar Dec 05 '24 15:12 jeromekelleher

Right, I know that the sorting process doesn't necessarily enforce parent/child correctness (and this is noted in the docs, and in https://github.com/tskit-dev/tskit/issues/2732) but they also say:

when there are multiple mutations per site, mutations should be ordered by decreasing time, if known, and parent mutations must occur before their children.. Violations of these sorting requirements are detected at load time.

I guess violations of mutation parent order are not (yet?) detected at load time, so the doc wording should be changed to point out this bug?

Edit - I see this is part of https://github.com/tskit-dev/tskit/issues/2757#issuecomment-1557651165

hyanwong avatar Dec 05 '24 16:12 hyanwong

It's a bit tricky. Maybe you could put in a link to the definitions instead of explaining, so at least it's all in one place?

jeromekelleher avatar Dec 05 '24 20:12 jeromekelleher

Maybe we just leave this open until it's (eventually) fixed? It does my head in trying to figure out from the rather involved mutation sorting requirements that the most recent mutations for a site are (should be) at the end of the list. I feel that just needs to be stated simply somewhere, for the non-technical reader.

hyanwong avatar Dec 05 '24 20:12 hyanwong

I think with the enforcement of canonical mutation ordering, this is now true, and this minor doc change ("older mutations will be listed before younger ones at this site") is finally correct and can be merged. Is that right @benjeffery ?

hyanwong avatar Oct 30 '25 13:10 hyanwong