nanopolish
nanopolish copied to clipboard
concordance in variant calling using down-sampled bam files
I called variant from the main bam file, then I did the same thin using after down-sampling the main file using samtools (-s 0.1, -s 0.2, -s 0.3, up tp -s 0.9)
. I then looked into how consistent are the called SNPs in each of these down-sampled filed compared to the main bam file (having 100% of the sequence reads).
samtools view --threads 24 -s 0.1 -b $file -o $file.out
nanopolish variants -t 24 -p 2 -w chr9:84273123-84368634 -r BC01.fastq -b BC01.bam -g NCBI.GRCh38.fa -o polish.vcf
The concordance % for -s o.9 to -s 0.1 files were 89.58, 95.56, 95.45, 95.45, 95.45, 97.73, 91.11, 93.33, 97.73. As you can see the concordance percentage increases with increasing the number of reads with two exceptions for -s 0.7 and -s 0.8
.
I had a closer look at called SNPs in these two files. Here is one example of a SNP which was called only in the main file but not in the -s 0.7 file. The alternative allele exist in both files with a frequency of 20.7 % in the -s 0.7 file and 21.6% in the main file (see below). Apparently this SNP should have been called in -s 0.7 file.
Aligned reads at the SNP locus For -s 0.7
Type | Base | Count | % of Total | Mean Quality
(match) | G | 1632 | 75.5 | 14.1
(mismatch) | A | 447 | 20.7 | 6.9
(mismatch) | C | 3 | 0.1 | 9.3
(mismatch) | T | 11 | 0.5 | 9.3
(deletion) | | 70 | 3.2 | ?
Total | | 2163 | 100 | 12.5
Aligned reads at the SNP locus For the main file
Type | Base | Count | % of Total | Mean Quality
(match) | G | 2296 | 74.5 | 14.0
(mismatch) | A | 665 | 21.6 | 7.0
(mismatch) | C | 6 | 0.2 | 6.3
(mismatch) | T | 11 | 0.4 | 9.3
(deletion) | | 104 | 3.4 | ?
Total | | 3082 | 100 | 12.4
Thanks T
can you send me the VCF record for the -s 0.8
file, when the SNP was called?
sorry for the late response, I was in a vacation.
Please find attached three vcf files for -s 0.7,
-s 0.8
and the main file.
These are vcf files after filtering out all indels. Varaint calling was on september seventh 2018
BC01_s0.7.vcf.Noindel.recode.vcf.txt BC01_s0.8.vcf.Noindel.recode.vcf.txt BC01_sorted.vcf.Noindel.recode.vcf.txt
Did you have time to look into the files? #
@jts any update?
Sorry I have not had time to work on this issue (or SNP calling in general).
On May 21, 2019, at 5:49 AM, herrroaa [email protected] wrote:
@jts any update?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.