systematic upward bias with `SNPRelate::snpgdsFst`
Hi Xiuwen,
I'm really impressed by the speed of SNPRelate really nice work !
I recently tested SNPRelate::snpgdsFst with method = "W&C84" and systematically get upward bias when tested against other software (e.g. GENODIVE, hierfsat).
I discarded these factors as potential cause for the bias:
- missing value (I work with RADseq data...) because I tested with and without imputation
- common markers between populations
- monomorphic markers
Not tested :
- the way
SNPRelate::snpgdsFsthandles uneven sampling size across populations
Would really like to get to the bottom of this, because your implementation is really the fastest in R!

Cheers Thierry
I will check my implementation of Fst. Please let me know the sample size in each population.
In the example above I used these:
POP_ID N
G100 10
G102 10
G103 10
G108 10
G109 10
G111 10
G118 10
G122 8
But do get similar bias with more than > 30 samples/pop Thanks for looking into this Thierry
Any update regarding this issue ?
I have compared my implementation of W&C Fst with the implementations in plink_v1.9 and vcftools.
plink_v1.9 and vcftools return both "Weir and Cockerham mean Fst estimate" and "Weir and Cockerham weighted Fst estimate". SNPRelate returns "Weir and Cockerham weighted Fst estimate", which is the same as plink_v1.9 and vcftools.
It seems that "weighted Fst estimate" is a little higher than "mean Fst estimate".
Thanks Xiuwen, I'll make some tests on my end with different RADseq data and will update results. Thierry
The latest SNPRelate on GitHub also provides MeanFst, and it will be available in the BioC release (in April).
see the help document snpgdsFst().