Dsuite
Dsuite copied to clipboard
Mixed haploid diploid datasets
Hi, I've been going over the formulas for the various statistics calculated by Dsuite. Evidently they are all tailored towards biallelic sites. I am therefore wondering if I am violating any assumptions by running these stats on a mixed haploid and diploid species level dataset (there is no way around it, we are sampling different life history states in species that alternate generations). At least two of the species for which we have haploid data appear to be playing a role in hybridization patterns, so I am keen to include them.
In the vcf file, the haploid species are genotyped as such. Dsuite offers no warnings and appears to calculate everything appropriately, and results appear to make biological sense (i.e. elevated D and f-ratios reflecting edges in a network showing shared genetic information at odds with ILS). So how is the haploid data treated, particularly in calculations that appear to explicitly demand biallelic sites (such as the f4-ratio)?
Really appreciate any insight on this before reading too much into results
Trev
Hi Trev
I would have to see a little of your VCF and the SETS.txt file to be sure how this is processed.
I think that all should be fine as long as haploid and diploid individuals are not mixed within the "Species" or populations specified in the SETS.txt file. I.e. each species/population can be composed either entirely of haploid individuals or entirely of diploid individuals.
Hope this makes sense
Milan