tskit icon indicating copy to clipboard operation
tskit copied to clipboard

time windows in statistics

Open petrelharp opened this issue 1 year ago • 8 comments

Here @tforest and I are starting in on adding time windows to statistics. We're starting with what was sketched out in #683, and will explain things in more detail here when we're farther along (ignore this for now).

petrelharp avatar May 09 '24 23:05 petrelharp

Codecov Report

Attention: Patch coverage is 88.57143% with 12 lines in your changes missing coverage. Please review.

Project coverage is 89.83%. Comparing base (16de381) to head (0d48891). Report is 25 commits behind head on main.

Files with missing lines Patch % Lines
python/tskit/trees.py 75.00% 5 Missing and 5 partials :warning:
c/tskit/trees.c 96.00% 0 Missing and 2 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2948      +/-   ##
==========================================
- Coverage   89.85%   89.83%   -0.03%     
==========================================
  Files          29       29              
  Lines       32128    32222      +94     
  Branches     5763     5784      +21     
==========================================
+ Hits        28868    28946      +78     
- Misses       1859     1868       +9     
- Partials     1401     1408       +7     
Flag Coverage Δ
c-tests 86.71% <96.07%> (+0.01%) :arrow_up:
lwt-tests 80.78% <ø> (ø)
python-c-tests 89.06% <100.00%> (+<0.01%) :arrow_up:
python-tests 98.80% <75.00%> (-0.18%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
c/tskit/core.c 95.83% <100.00%> (ø)
python/_tskitmodule.c 89.06% <100.00%> (+<0.01%) :arrow_up:
c/tskit/trees.c 90.70% <96.00%> (+0.02%) :arrow_up:
python/tskit/trees.py 98.24% <75.00%> (-0.57%) :arrow_down:

... and 1 file with indirect coverage changes

codecov[bot] avatar May 09 '24 23:05 codecov[bot]

Note: it is not clear how to do this for site statistics, since the site stat is of the form $$\sum_a f(w_a)$$ where the sum is over alleles, and $w_a$ is the weight of all samples with allele $a$; however, it is mutations that have times, not alleles.

The proposal will probably be to compute a site stat that sums over mutations, not alleles, but we'll start with branch stats only for now.

petrelharp avatar May 17 '24 21:05 petrelharp

Next step:

  • do the AFS first, since it's less tangled up

Also maybe:

  • allow ts.decapitate( ) to take inf as an argument (that does nothing) ?

petrelharp avatar May 17 '24 22:05 petrelharp

a small nudge here that i mentioned to @petrelharp in passing-- it would be great to have an expectation from theory as to what time stratified quantities like the SFS should be under the (standard, neutral) coalescent

andrewkern avatar May 17 '24 23:05 andrewkern

Some thoughts after working on time windows.

After these edits the moment the output of, let's say, the AFS is a still 2D array of windows, same for time_windows, when using either of them individually. However, when using windows and time_windows at the same time, the output is a 3D array, with the following shape: [num_windows][num_time_windows][sample_size]. When windows or time_windows are None, associated dimensions are dropped accordingly. As there is now two types of windows, it will become ambiguous that the historical "windows" parameter is in fact corresponding specifically to genomic spanning windows. We did not renamed it for now though, as it would break previous behavior.

Some ideas:

  • Add new benchmarks for summary stats to see if the implemented features are optimized both in terms of computational space and time complexity.
  • Add some plots for summary stats to observe how time windows impact them.

tforest avatar Jul 15 '24 22:07 tforest

A note on the potential confusion between windows and time_windows - often one endpoint of the time_windows will be Inf, so if we make sure we produce an informative error if the windows aren't finite, we'll help people avoid the mistake.

petrelharp avatar Jul 16 '24 22:07 petrelharp

I've added this work to the next release milestone. Hoping to get a release out in a week or two, if that is too ambitious for this let me know.

benjeffery avatar Sep 23 '24 10:09 benjeffery

Probably too ambitious, but we might have something in by then.

petrelharp avatar Sep 23 '24 19:09 petrelharp