msprime icon indicating copy to clipboard operation
msprime copied to clipboard

multiallelic sites within the VCF

Open LuisGFdez opened this issue 4 months ago • 1 comments

Hello,

I am simulating STRs, and I've encountered instances where I'm obtaining sites with more than nine allelic states, which isn't supported by the VCF file format. Consequently, I'm considering how to manage these multiallelic sites within the VCF or if employing a sequence tree might be the simplest approach. In such a case, the ordering of the corresponding genotypes for each allelic state would align with the individuals assigned to the ancestry.

To provide context, my objective is to simulate STRs over millions of years across different species of great apes, exploring various evolutionary scenarios to understand the selection processes contributing to differences in STR composition among these species.

Additionally, I've observed that mutation sites occur at every position within the size of my chromosomes. I'm uncertain about the efficiency of simulating STRs in chromosomes with a size of 1e9 if I'll essentially obtain 1e9 mutation sites. Is there a method to restrict the number of sites where mutations can occur?

LuisGFdez avatar Feb 15 '24 20:02 LuisGFdez