grenedalf icon indicating copy to clipboard operation
grenedalf copied to clipboard

How to understand the statistics of the screen output?

Open biozzq opened this issue 10 months ago • 1 comments

Dear @lczech,

I have four pooling sequencing libraries. I have finished SNP calling using bwa+GATK and a total of 9.8 M SNPs (after filtering for depth and missing rate) have been identified. I found that grenedalf can directly work with BAM and VCF format. So I first calculated the FST using BAM as the input. At the end of the programme, the following statistics were recorded. How to understand each statistic and which number better represents the number of SNPs? If 28568428 represents the number of SNPs, it is significantly different from 9.8 M.

Sample filter summary (summed up across all samples):
Passed:               4036079496
Empty (after counts): 32041114
Above max coverage:   263818
Total filter summary (after applying all sample filters):
Passed:                  28568428
Below min coverage:      51188538
Not SNP:                 937243195

Finished 2023-10-02 23:48:24

When using VCF as the input, the statistics looks much more normal. It seems like that 9795814 represents the number if SNPs after filtering by minor allele count > 0.

Processed 39 chromosomes with 9795814 (non-filtered) positions in 47245 windows.
Total filter summary (after applying all sample filters):
Passed:                  9795814
Not SNP:                 120324

Finished 2023-10-03 10:29:20

Sincerely, Zhuqing Zheng

biozzq avatar Oct 03 '23 04:10 biozzq