msprime icon indicating copy to clipboard operation
msprime copied to clipboard

use reference sequence when adding mutation

Open petrelharp opened this issue 2 years ago • 3 comments

Now we have reference sequences (sometimes), and so sim_mutations should use this, when it is present.

Recall that a mutation generator has two pieces: first, a way to choose the root state and then, a way to choose derived states.

The most obvoius way this would work would be that in mutation_model_choose_root_state we check if there is a reference sequence, and if so, grab the allele. (If the allele doesn't make sense with the mutation model, we'll get an error when we try to apply a mutation.)

I don't think we need to be more complicated than this?

petrelharp avatar Feb 25 '22 13:02 petrelharp

Sounds good to me. I guess this is assuming that the reference sequence is the ancestral state though, which isn't the case for most species (I'd imagine).

We could definitely add an option though (reference_as_ancestral?), for cases where this makes sense.

jeromekelleher avatar Feb 28 '22 10:02 jeromekelleher

Sounds good to me. I guess this is assuming that the reference sequence is the ancestral state though, which isn't the case for most species (I'd imagine).

We could definitely add an option though (reference_as_ancestral?), for cases where this makes sense.

It's still going to be a better guess (eg reflecting local base composition) than random, so I think using it would make sense as a default?

petrelharp avatar Feb 28 '22 13:02 petrelharp

Sure. You're not going to have a reference sequence unless you go out of your way to have it anyway probably, so SGTM.

jeromekelleher avatar Feb 28 '22 14:02 jeromekelleher