Reference Sequence SNP Call
Hello, I ran parsnp using a 17 whole genomes. I picked one of these genomes to use as a reference, but kept the sequence file in the directory still. After running parsnp I looked at the VCF output and it seems that there are SNPs being called when the reference sequence is mapped against itself. Does this make sense? It should be the exact same sequence mapped against each other, so why would there be SNPs?
Thank you!
Hi @hdesale2408,
Sorry for the delay in responding. Yes, there should not be SNPs between the same sequence in an alignment. There have been some noted (but rare) issues w/ Parsnp incorrectly parsing .gbk files. Did your input contain .gbk/.gb files by any chance?
Hi @hdesale2408,
This bug has been fixed in Parsnp 2.0. Please let me know if you continue to experience it!
The bug is still happening in versions >= 2.0:
##FILTER=<ID=IND,Description="Column contains indel">
##FILTER=<ID=N,Description="Column contains N">
##FILTER=<ID=LCB,Description="LCB smaller than 200bp">
##FILTER=<ID=CID,Description="SNP in aligned 100bp window with < 50% column % ID">
##FILTER=<ID=ALN,Description="SNP in aligned 100b window with > 20 indels">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GCF_000146045.2_R64_genomic.fasta.ref PE-2.purified.assembly.fasta 2770Lv1.purified.assembly.fasta
NC_001136.10 3320 CGCTCAAACG.GAGGCCATGC G A 40 LCB NA GT 0 0 1
NC_001136.10 12147 AAGACATTTT.ACCCCGATAC A T,C 40 PASS NA GT 1 1 2
NC_001136.10 12186 AGCCATCATT.GAAGCCGCTC G T,C 40 PASS NA GT 1 2 2
NC_001136.10 12187 GCCATCATTG.AAGCCGCTCC A G,C 40 PASS NA GT 1 2 2
NC_001136.10 12188 CCATCATTGA.AGCCGCTCCG A C 40 PASS NA GT 0 0 1
NC_001136.10 12192 CATTGAAGCC.GCTCCGAATA G C,T 40 PASS NA GT 1 2 2
NC_001136.10 12195 TGAAGCCGCT.CCGAATAACA C T,A 40 PASS NA GT 1 2 2
NC_001136.10 12198 AGCCGCTCCG.AATAACAGAC A G 40 PASS NA GT 1 0 0
NC_001136.10 12204 TCCGAATAAC.AGACATTTAC A C 40 PASS NA GT 1 1 0
NC_001136.10 12222 ACGACGGCCT.ATTTTGTCTA A T 40 PASS NA GT 1 1 0
NC_001136.10 12258 AATAGTGGAG.AAGAAATTCA A G 40 PASS NA GT 1 1 0
NC_001136.10 12264 GGAGAAGAAA.TTCACTGTAC T A,G 40 PASS NA GT 1 2 2
NC_001136.10 12288 GACGATCGAA.GTCTCAAACC G A,C 40 PASS NA GT 1 2 2
NC_001136.10 12309 ATCAGTAAGC.CCAActgatt C A 40 PASS NA GT 0 0 1
NC_001136.10 12312 AGTAAGCCCA.Actgatttga A G 40 PASS NA GT 0 0 1
NC_001136.10 12417 AAGGACTTTT.GTTTTGACGG G T,A 40 PASS NA GT 1 2 2
GCF_000146045.2_R64_genomic.fasta.ref is the ref genome and is was not supposed no have SNPs