tskit icon indicating copy to clipboard operation
tskit copied to clipboard

`mutation_mask` argument to site-mode statistics

Open petrelharp opened this issue 3 years ago • 1 comments
trafficstars

Over in stdpopsim we're wanting to compute frequency spectra for only a certain set of mutations (the non-neutral ones, for instance). These are mixed right in with neutral ones (consider synonymous/nonsynonymous mutations.) To make this easier we could provide a mutation_mask argument, that applies only to statistics with mode="site", that is a boolean vector of length equal to the number of mutations, and only those mutations would be used. This would not affect the denominator (if span_normalize=True).

(Initially I thought this would be site_mask, but for cases with more than one mutation at a site, mutation_site is better.)

petrelharp avatar Feb 17 '22 04:02 petrelharp

SGTM - would need some considerable replumbing though I fear, as there's no allowance for this sort of thing in the C API.

jeromekelleher avatar Feb 17 '22 13:02 jeromekelleher