tsinfer icon indicating copy to clipboard operation
tsinfer copied to clipboard

Add a sample_id column to ancestors

Open hyanwong opened this issue 4 years ago • 5 comments

Starts to address #604. Also changes the sampledata format to 3.1. We don't actually do anything with the stored IDs yet, though!

hyanwong avatar Nov 15 '21 13:11 hyanwong

Codecov Report

Merging #607 (78d15c1) into main (69bb8c7) will not change coverage. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #607   +/-   ##
=======================================
  Coverage   93.11%   93.11%           
=======================================
  Files          17       17           
  Lines        5155     5155           
  Branches      959      959           
=======================================
  Hits         4800     4800           
  Misses        235      235           
  Partials      120      120           
Flag Coverage Δ
C 93.11% <100.00%> (ø)
python 96.33% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tsinfer/formats.py 98.44% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Nov 15 '21 14:11 codecov[bot]

While we in tsinfer land, do you think we could merge this too @jeromekelleher ? What's the deal with the format spec ID changes - do we make point changes or make a full integer change?

hyanwong avatar Sep 22 '22 12:09 hyanwong

We don't want to invalidate all existing AncestorData files - needs to take into account this column potentially being missing and needs to test for that (== complicated).

Is there a direct downstream need for this?

jeromekelleher avatar Sep 22 '22 12:09 jeromekelleher

If we are making backwards-incompatible changes, I think this is a relatively simple addition to make, and will enable matching to actual samples, which is useful e.g. for viral trees. Personally I think it worth including in https://github.com/tskit-dev/tsinfer/milestone/3 but happy to chat about it further.

hyanwong avatar Oct 05 '22 12:10 hyanwong

After discussion we decided to look at this after investigating the option of adding a samples argument to match_ancestors, which includes the required samples at the appropriate time.

jeromekelleher avatar Oct 05 '22 13:10 jeromekelleher