parsnp icon indicating copy to clipboard operation
parsnp copied to clipboard

Reference Sequence SNP Call

Open hdesale2408 opened this issue 3 years ago • 3 comments

Hello, I ran parsnp using a 17 whole genomes. I picked one of these genomes to use as a reference, but kept the sequence file in the directory still. After running parsnp I looked at the VCF output and it seems that there are SNPs being called when the reference sequence is mapped against itself. Does this make sense? It should be the exact same sequence mapped against each other, so why would there be SNPs?

Thank you!

hdesale2408 avatar Jul 19 '22 18:07 hdesale2408

Hi @hdesale2408,

Sorry for the delay in responding. Yes, there should not be SNPs between the same sequence in an alignment. There have been some noted (but rare) issues w/ Parsnp incorrectly parsing .gbk files. Did your input contain .gbk/.gb files by any chance?

bkille avatar Nov 16 '23 17:11 bkille

Hi @hdesale2408,

This bug has been fixed in Parsnp 2.0. Please let me know if you continue to experience it!

bkille avatar Jan 05 '24 17:01 bkille

The bug is still happening in versions >= 2.0:

##FILTER=<ID=IND,Description="Column contains indel">
##FILTER=<ID=N,Description="Column contains N">
##FILTER=<ID=LCB,Description="LCB smaller than 200bp">
##FILTER=<ID=CID,Description="SNP in aligned 100bp window with < 50% column % ID">
##FILTER=<ID=ALN,Description="SNP in aligned 100b window with > 20 indels">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	GCF_000146045.2_R64_genomic.fasta.ref	PE-2.purified.assembly.fasta	2770Lv1.purified.assembly.fasta
NC_001136.10	3320	CGCTCAAACG.GAGGCCATGC	G	A	40	LCB	NA	GT	0	0	1
NC_001136.10	12147	AAGACATTTT.ACCCCGATAC	A	T,C	40	PASS	NA	GT	1	1	2
NC_001136.10	12186	AGCCATCATT.GAAGCCGCTC	G	T,C	40	PASS	NA	GT	1	2	2
NC_001136.10	12187	GCCATCATTG.AAGCCGCTCC	A	G,C	40	PASS	NA	GT	1	2	2
NC_001136.10	12188	CCATCATTGA.AGCCGCTCCG	A	C	40	PASS	NA	GT	0	0	1
NC_001136.10	12192	CATTGAAGCC.GCTCCGAATA	G	C,T	40	PASS	NA	GT	1	2	2
NC_001136.10	12195	TGAAGCCGCT.CCGAATAACA	C	T,A	40	PASS	NA	GT	1	2	2
NC_001136.10	12198	AGCCGCTCCG.AATAACAGAC	A	G	40	PASS	NA	GT	1	0	0
NC_001136.10	12204	TCCGAATAAC.AGACATTTAC	A	C	40	PASS	NA	GT	1	1	0
NC_001136.10	12222	ACGACGGCCT.ATTTTGTCTA	A	T	40	PASS	NA	GT	1	1	0
NC_001136.10	12258	AATAGTGGAG.AAGAAATTCA	A	G	40	PASS	NA	GT	1	1	0
NC_001136.10	12264	GGAGAAGAAA.TTCACTGTAC	T	A,G	40	PASS	NA	GT	1	2	2
NC_001136.10	12288	GACGATCGAA.GTCTCAAACC	G	A,C	40	PASS	NA	GT	1	2	2
NC_001136.10	12309	ATCAGTAAGC.CCAActgatt	C	A	40	PASS	NA	GT	0	0	1
NC_001136.10	12312	AGTAAGCCCA.Actgatttga	A	G	40	PASS	NA	GT	0	0	1
NC_001136.10	12417	AAGGACTTTT.GTTTTGACGG	G	T,A	40	PASS	NA	GT	1	2	2

GCF_000146045.2_R64_genomic.fasta.ref is the ref genome and is was not supposed no have SNPs

vinicius-santos-bmc avatar Dec 04 '24 16:12 vinicius-santos-bmc