msprime icon indicating copy to clipboard operation
msprime copied to clipboard

Validation of microsat mutation models

Open jeromekelleher opened this issue 2 years ago • 8 comments

We need some reproducible statistical validation of the microsat mutation models in verification.py. Do we have analytical results we can compare to or to other simulators?

Any thoughts @petrelharp @andrewkern?

jeromekelleher avatar Jun 20 '22 08:06 jeromekelleher

Slatkin's results from this old paper would be perfect for testing the SMM https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1206343/

andrewkern avatar Jun 20 '22 16:06 andrewkern

IIRC there's some good validation for the other mutation models that should be copy/extendable.

petrelharp avatar Jun 21 '22 00:06 petrelharp

how would you two feel about using this script as an external software check for verification?

https://onlinelibrary.wiley.com/doi/10.1111/j.1471-8286.2006.01286.x

looking at the other mutation model verifications it looks like you are using seq-gen and pyvolve and this could perhaps serve a similar purpose?

andrewkern avatar Jun 26 '22 05:06 andrewkern

I'll not weigh in on the issue of including a perl script, but you could at least do like this. (however I agree that comparing to an external tool would be a much better check...)

petrelharp avatar Jun 26 '22 05:06 petrelharp

let's see what Jerome says. the tests you point to above are a lot like what's already going on in test_mutations.py already i think

andrewkern avatar Jun 26 '22 15:06 andrewkern

External validation is the way to go here. The language the external tool is written in shouldn't make much difference, although it depends on how obnoxious the dependencies are (i.e., I'd expect to be able to run it without having to install anything onto a standard Debian system).

jeromekelleher avatar Jun 27 '22 08:06 jeromekelleher

Not that we shouldn't add the other tests too, if that's straightforward, but an external validation gives a lot of reassurance.

jeromekelleher avatar Jun 27 '22 08:06 jeromekelleher

okay so #2085 has the first verification which is a test of the variance of a sample under SMM. that's basically the only analytical result that i've been able to find so far, although there are derivatives, e.g. Fst or divergence, which make use of this.

Further, that ms2ms.pl script only simulates under SMM, so while we could use it to look at variances under non-equilibrium models, I think we are all set if we hit the expectations from the analytic result above?

Basically I'm looking for ideas on what to add to #2085

andrewkern avatar Jul 01 '22 23:07 andrewkern

We need to make a decision about this before tagging 1.3.0, as microsats are new for this release. @GertjanBisschop, can you take a look and see what you think? Do we have enough validation for microsats?

jeromekelleher avatar Dec 06 '23 10:12 jeromekelleher

happy to help @jeromekelleher @GertjanBisschop

andrewkern avatar Dec 06 '23 14:12 andrewkern

I had a little look around and there are very few tools out there doing microsats in a way that is straightforward to check against what msprime produces and that are rigorously tested themselves. The ms2ms.pl script is also only tested against the expectation of the variance of the copy number. So that is identical to what is already being done here.

I don't think that the tests that @petrelharp mentioned are already in place, but could have missed that. Here you would test the observed number of transitions for each possible allele against the expected number given a treesequence.

GertjanBisschop avatar Dec 07 '23 09:12 GertjanBisschop

Your call @GertjanBisschop, I'm happy with whatever you decide

jeromekelleher avatar Dec 07 '23 11:12 jeromekelleher

Addressed in #2241.

GertjanBisschop avatar Dec 12 '23 22:12 GertjanBisschop