treetime
treetime copied to clipboard
Mugration p-value
Description
I'm tackling the issue of sampling bias in mugration, and was curious if a p-value might be of use here? If I knew the probability of an event happening by chance (given the data) it might guide interpretations.
Disclaimer: I am not a statistician, so if I'm way off, or this is already described, please let me know!
Theory
Given n states s1, s2,... sn with frequencies f1, f2,...fn, what is the probability of observing a transition of sj to sk by chance?
Working Example
What is the probability of observing a mugration event between Russia and Germany by chance? In this example, this probability/p-value is 0.14 and it's up to the user to decide whether that is too high.
import itertools
states = ["Russia", "Lithuania", "Estonia", "Germany"]
frequencies = [4,1,1,2]
observations = []
for s,f in zip(states, frequencies):
observations += [s] * f
# ['Russia', 'Russia', 'Russia', 'Russia', 'Lithuania', 'Estonia', 'Germany', 'Germany']
transitions = list(itertools.permutations(observations, 2))
transitions_uniq = set(transitions)
# I'm uncertain if "staying in place" should be considered a transition?
target = ("Russia", "Germany")
pvalue = transitions.count(target) / len(transitions)
# Results in a p-value of 0.14
I guess one thing that one could test is whether particular transitions happen more frequently than expected in a flat transition matrix. But the probabilistic interpretation of mugration models are subtle and first and foremost depend on sampling and the assumption of reversibility.