nanopolish icon indicating copy to clipboard operation
nanopolish copied to clipboard

concordance in variant calling using down-sampled bam files

Open herrroaa opened this issue 5 years ago • 5 comments

I called variant from the main bam file, then I did the same thin using after down-sampling the main file using samtools (-s 0.1, -s 0.2, -s 0.3, up tp -s 0.9). I then looked into how consistent are the called SNPs in each of these down-sampled filed compared to the main bam file (having 100% of the sequence reads). samtools view --threads 24 -s 0.1 -b $file -o $file.out nanopolish variants -t 24 -p 2 -w chr9:84273123-84368634 -r BC01.fastq -b BC01.bam -g NCBI.GRCh38.fa -o polish.vcf

The concordance % for -s o.9 to -s 0.1 files were 89.58, 95.56, 95.45, 95.45, 95.45, 97.73, 91.11, 93.33, 97.73. As you can see the concordance percentage increases with increasing the number of reads with two exceptions for -s 0.7 and -s 0.8.

I had a closer look at called SNPs in these two files. Here is one example of a SNP which was called only in the main file but not in the -s 0.7 file. The alternative allele exist in both files with a frequency of 20.7 % in the -s 0.7 file and 21.6% in the main file (see below). Apparently this SNP should have been called in -s 0.7 file.

Aligned reads at the SNP locus For -s 0.7 Type | Base | Count | % of Total | Mean Quality (match) | G | 1632 | 75.5 | 14.1 (mismatch) | A | 447 | 20.7 | 6.9 (mismatch) | C | 3 | 0.1 | 9.3 (mismatch) | T | 11 | 0.5 | 9.3 (deletion) |   | 70 | 3.2 | ? Total |   | 2163 | 100 | 12.5

Aligned reads at the SNP locus For the main file Type | Base | Count | % of Total | Mean Quality (match) | G | 2296 | 74.5 | 14.0 (mismatch) | A | 665 | 21.6 | 7.0 (mismatch) | C | 6 | 0.2 | 6.3 (mismatch) | T | 11 | 0.4 | 9.3 (deletion) |   | 104 | 3.4 | ? Total |   | 3082 | 100 | 12.4

Thanks T

herrroaa avatar Sep 19 '18 16:09 herrroaa

can you send me the VCF record for the -s 0.8 file, when the SNP was called?

jts avatar Sep 24 '18 15:09 jts

sorry for the late response, I was in a vacation. Please find attached three vcf files for -s 0.7, -s 0.8 and the main file. These are vcf files after filtering out all indels. Varaint calling was on september seventh 2018

BC01_s0.7.vcf.Noindel.recode.vcf.txt BC01_s0.8.vcf.Noindel.recode.vcf.txt BC01_sorted.vcf.Noindel.recode.vcf.txt

herrroaa avatar Oct 08 '18 22:10 herrroaa

Did you have time to look into the files? #

herrroaa avatar Oct 23 '18 15:10 herrroaa

@jts any update?

herrroaa avatar May 21 '19 04:05 herrroaa

Sorry I have not had time to work on this issue (or SNP calling in general).

On May 21, 2019, at 5:49 AM, herrroaa [email protected] wrote:

@jts any update?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jts avatar May 21 '19 09:05 jts