tskit icon indicating copy to clipboard operation
tskit copied to clipboard

allele frequency function

Open petrelharp opened this issue 5 years ago • 18 comments

We should implement the TreeSequence.allele_frequencies(sample_sets) function, which returns a numpy array of (non-ancestral allele frequencies) x (sample_sets).

Here's an implementation:

def allele_frequencies(ts, sample_sets=None):
    if sample_sets is None:
       sample_sets = [ts.samples()] 
    n = np.array([len(x) for x in sample_sets])
    def f(x):
       return x / n
    return ts.sample_count_stat(sample_sets, f, len(sample_sets), windows='sites', polarised=True, mode='site', strict=False, span_normalise=False)

Edit: originally this omitted span_normalise=False.

petrelharp avatar Mar 30 '20 17:03 petrelharp