F4 icon indicating copy to clipboard operation
F4 copied to clipboard

F4-statistics from unlinked SNPs of SNP array

Open ghost opened this issue 4 years ago • 4 comments

I have a question associated with the usage of this tool, can this tool be used tool to calculate F4-statistics on my data that consists of unlinked SNPs from SNP array? Will the simulation by fastsimcoal2 (as run by F4.py) affected by it?

ghost avatar Sep 19 '21 15:09 ghost

Hi, I guess with SNP array data, D- or F4-statistics could be influence by how the SNPs were originally selected for the array. I assume this was not done randomly, but with variability in the species/population in mind? Besides that, yes, F4 can calculate the F4 statistic from such data. But the statistic itself shouldn't be any different if you calculate it with a tool like Dsuite, and the latter would be much faster. The F4 tool might only report a different p-value because this is what the simulations are used for. In your case, I would probably just run Dsuite now and if you worry that the p-value might be affected by the jackknifing method, then you could also run the F4 tool.

mmatschiner avatar Sep 19 '21 16:09 mmatschiner

Thank you for the prompt response even on sunday (I really appreciate it). Apart from SNP array, I also have WGS-SNPs, therefore, there, I could use random SNPs situated at relatively distant places along the genome. And yes, Indeed, I was worried about how using different blocks of jackknifing affect the Z-values, that is why I wanted to use this tool! My only worry was whether the simulation parameters as carried out by fastsimcaol2 (in F4.py) is specific to RAD-seq (like generated SNPs in block).

ghost avatar Sep 19 '21 16:09 ghost

With WGS-SNPs, I probably would not worry too much about linkage in the calculation of D or F4. You could of course try with or without thinning the dataset, but I wouldn't expect much difference (unless a gigantic inversion region has a large influence or similar). But the F4 tool is definitely applicable to data other than RAD-seq data. In that sense, SNP array data should be fine.

mmatschiner avatar Sep 19 '21 17:09 mmatschiner

Okay, thank you again for the prompt response! I will try this tool as well as D-suite tool on my WGS as well as as SNP array data.

ghost avatar Sep 19 '21 17:09 ghost