msprime Validation of microsat mutation models

We need some reproducible statistical validation of the microsat mutation models in verification.py. Do we have analytical results we can compare to or to other simulators?

Any thoughts @petrelharp @andrewkern?

Jun 20 '22 08:06 jeromekelleher

Slatkin's results from this old paper would be perfect for testing the SMM https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1206343/

Jun 20 '22 16:06 andrewkern

IIRC there's some good validation for the other mutation models that should be copy/extendable.

Jun 21 '22 00:06 petrelharp

how would you two feel about using this script as an external software check for verification?

https://onlinelibrary.wiley.com/doi/10.1111/j.1471-8286.2006.01286.x

looking at the other mutation model verifications it looks like you are using seq-gen and pyvolve and this could perhaps serve a similar purpose?

Jun 26 '22 05:06 andrewkern

I'll not weigh in on the issue of including a perl script, but you could at least do like this. (however I agree that comparing to an external tool would be a much better check...)

Jun 26 '22 05:06 petrelharp

let's see what Jerome says. the tests you point to above are a lot like what's already going on in test_mutations.py already i think

Jun 26 '22 15:06 andrewkern

External validation is the way to go here. The language the external tool is written in shouldn't make much difference, although it depends on how obnoxious the dependencies are (i.e., I'd expect to be able to run it without having to install anything onto a standard Debian system).

Jun 27 '22 08:06 jeromekelleher

Not that we shouldn't add the other tests too, if that's straightforward, but an external validation gives a lot of reassurance.

Jun 27 '22 08:06 jeromekelleher

okay so #2085 has the first verification which is a test of the variance of a sample under SMM. that's basically the only analytical result that i've been able to find so far, although there are derivatives, e.g. Fst or divergence, which make use of this.

Further, that ms2ms.pl script only simulates under SMM, so while we could use it to look at variances under non-equilibrium models, I think we are all set if we hit the expectations from the analytic result above?

Basically I'm looking for ideas on what to add to #2085

Jul 01 '22 23:07 andrewkern

We need to make a decision about this before tagging 1.3.0, as microsats are new for this release. @GertjanBisschop, can you take a look and see what you think? Do we have enough validation for microsats?

Dec 06 '23 10:12 jeromekelleher

happy to help @jeromekelleher @GertjanBisschop

Dec 06 '23 14:12 andrewkern

I had a little look around and there are very few tools out there doing microsats in a way that is straightforward to check against what msprime produces and that are rigorously tested themselves. The ms2ms.pl script is also only tested against the expectation of the variance of the copy number. So that is identical to what is already being done here.

I don't think that the tests that @petrelharp mentioned are already in place, but could have missed that. Here you would test the observed number of transitions for each possible allele against the expected number given a treesequence.

Dec 07 '23 09:12 GertjanBisschop

Your call @GertjanBisschop, I'm happy with whatever you decide

Dec 07 '23 11:12 jeromekelleher

Addressed in #2241.

Dec 12 '23 22:12 GertjanBisschop

msprime msprime copied to clipboard

Validation of microsat mutation models

msprime
msprime copied to clipboard