msprime
msprime copied to clipboard
use reference sequence when adding mutation
Now we have reference sequences (sometimes), and so sim_mutations
should use this, when it is present.
Recall that a mutation generator has two pieces: first, a way to choose the root state and then, a way to choose derived states.
The most obvoius way this would work would be that in mutation_model_choose_root_state we check if there is a reference sequence, and if so, grab the allele. (If the allele doesn't make sense with the mutation model, we'll get an error when we try to apply a mutation.)
I don't think we need to be more complicated than this?
Sounds good to me. I guess this is assuming that the reference sequence is the ancestral state though, which isn't the case for most species (I'd imagine).
We could definitely add an option though (reference_as_ancestral
?), for cases where this makes sense.
Sounds good to me. I guess this is assuming that the reference sequence is the ancestral state though, which isn't the case for most species (I'd imagine).
We could definitely add an option though (
reference_as_ancestral
?), for cases where this makes sense.
It's still going to be a better guess (eg reflecting local base composition) than random, so I think using it would make sense as a default?
Sure. You're not going to have a reference sequence unless you go out of your way to have it anyway probably, so SGTM.