SNPRelate icon indicating copy to clipboard operation
SNPRelate copied to clipboard

systematic upward bias with `SNPRelate::snpgdsFst`

Open thierrygosselin opened this issue 9 years ago • 6 comments

Hi Xiuwen,

I'm really impressed by the speed of SNPRelate really nice work !

I recently tested SNPRelate::snpgdsFst with method = "W&C84" and systematically get upward bias when tested against other software (e.g. GENODIVE, hierfsat).

I discarded these factors as potential cause for the bias:

  • missing value (I work with RADseq data...) because I tested with and without imputation
  • common markers between populations
  • monomorphic markers

Not tested :

  • the way SNPRelate::snpgdsFst handles uneven sampling size across populations

Would really like to get to the bottom of this, because your implementation is really the fastest in R! screenshot 2016-11-28 09 16 35

Cheers Thierry

thierrygosselin avatar Nov 28 '16 14:11 thierrygosselin

I will check my implementation of Fst. Please let me know the sample size in each population.

zhengxwen avatar Nov 28 '16 21:11 zhengxwen

In the example above I used these:

POP_ID   N 
G100    10
G102    10
G103    10
G108    10
G109    10
G111    10
G118    10
G122     8

But do get similar bias with more than > 30 samples/pop Thanks for looking into this Thierry

thierrygosselin avatar Dec 01 '16 22:12 thierrygosselin

Any update regarding this issue ?

thierrygosselin avatar Jan 10 '17 15:01 thierrygosselin

I have compared my implementation of W&C Fst with the implementations in plink_v1.9 and vcftools.

plink_v1.9 and vcftools return both "Weir and Cockerham mean Fst estimate" and "Weir and Cockerham weighted Fst estimate". SNPRelate returns "Weir and Cockerham weighted Fst estimate", which is the same as plink_v1.9 and vcftools.

It seems that "weighted Fst estimate" is a little higher than "mean Fst estimate".

zhengxwen avatar Feb 01 '17 01:02 zhengxwen

Thanks Xiuwen, I'll make some tests on my end with different RADseq data and will update results. Thierry

thierrygosselin avatar Feb 24 '17 02:02 thierrygosselin

The latest SNPRelate on GitHub also provides MeanFst, and it will be available in the BioC release (in April). see the help document snpgdsFst().

zhengxwen avatar Feb 25 '17 00:02 zhengxwen