msprime
msprime copied to clipboard
Validation of microsat mutation models
We need some reproducible statistical validation of the microsat mutation models in verification.py. Do we have analytical results we can compare to or to other simulators?
Any thoughts @petrelharp @andrewkern?
Slatkin's results from this old paper would be perfect for testing the SMM
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1206343/
IIRC there's some good validation for the other mutation models that should be copy/extendable.
how would you two feel about using this script as an external software check for verification?
https://onlinelibrary.wiley.com/doi/10.1111/j.1471-8286.2006.01286.x
looking at the other mutation model verifications it looks like you are using seq-gen
and pyvolve
and this could perhaps serve a similar purpose?
I'll not weigh in on the issue of including a perl script, but you could at least do like this. (however I agree that comparing to an external tool would be a much better check...)
let's see what Jerome says. the tests you point to above are a lot like what's already going on in test_mutations.py
already i think
External validation is the way to go here. The language the external tool is written in shouldn't make much difference, although it depends on how obnoxious the dependencies are (i.e., I'd expect to be able to run it without having to install anything onto a standard Debian system).
Not that we shouldn't add the other tests too, if that's straightforward, but an external validation gives a lot of reassurance.
okay so #2085 has the first verification which is a test of the variance of a sample under SMM. that's basically the only analytical result that i've been able to find so far, although there are derivatives, e.g. Fst or divergence, which make use of this.
Further, that ms2ms.pl
script only simulates under SMM, so while we could use it to look at variances under non-equilibrium models, I think we are all set if we hit the expectations from the analytic result above?
Basically I'm looking for ideas on what to add to #2085
We need to make a decision about this before tagging 1.3.0, as microsats are new for this release. @GertjanBisschop, can you take a look and see what you think? Do we have enough validation for microsats?
happy to help @jeromekelleher @GertjanBisschop
I had a little look around and there are very few tools out there doing microsats in a way that is straightforward to check against what msprime produces and that are rigorously tested themselves. The ms2ms.pl
script is also only tested against the expectation of the variance of the copy number. So that is identical to what is already being done here.
I don't think that the tests that @petrelharp mentioned are already in place, but could have missed that. Here you would test the observed number of transitions for each possible allele against the expected number given a treesequence.
Your call @GertjanBisschop, I'm happy with whatever you decide
Addressed in #2241.