snippy
snippy copied to clipboard
Deletions in core.full.aln
Hello,
I ran snippy-multi on a set of 24 samples. According to the log output and the snps.tab file for each sample, 13 of them had a deletion at the same site. However, in the core.full.aln file, only 1 has a "-" at that position while the other 12 are the same as the reference. Any idea why this might be happening?
- What does
snippy --version
say? - Did you use any unusual command line arguments?
Version 4.3.6. Only command line arguments being used with the snippy-multi command were the input reads, the reference file, and the number of CPUs
This could be a bug. Are you able to email me the *.tab, *.vcf files?
Hello, Could this behaviour come from the early bail-out on https://github.com/tseemann/snippy/blob/master/bin/snippy-core#L282
That would be consistent with the described behaviour if all samples (for my $id (@id) loop) have $alt == '-' but the patching of the full alignment (is on the row above) only updates the first and then skips to the next position.
I think I managed to create a small(-ish) test case to verify this behaviour. See the attached file with the data (public data off of ENA/EBI)
$ snippy --version
snippy 4.3.6
All three samples have the same first deletion at position 217597 (well more specifically the del is 217598). testdata.zip
$ grep 'TYPE=del' 148/snps.vcf | head -n1
CP003351.1 217597 . AT A 6320.52 . AB=0;AO=200;DP=200;QA=7080;QR=0;RO=0;TYPE=del GT:DP:RO:QR:AO:QA:GL 1/1:200:0:0:200:7080:-635.929,-60.206,0
$ grep 'TYPE=del' 149/snps.vcf | head -n1
CP003351.1 217597 . AT A 5224.22 . AB=0;AO=166;DP=166;QA=5862;QR=0;RO=0;TYPE=del GT:DP:RO:QR:AO:QA:GL 1/1:166:0:0:166:5862:-526.617,-49.971,0
$ grep 'TYPE=del' 198/snps.vcf | head -n1
CP003351.1 217597 . AT A 4223.79 . AB=0;AO=132;DP=133;QA=4757;QR=0;RO=0;TYPE=del GT:DP:RO:QR:AO:QA:GL 1/1:133:0:0:132:4757:-427.365,-39.736,0
$ snippy-core --ref CP003351.1.fa 148 149 198
$ samtools faidx core.full.aln
$ samtools faidx core.full.aln 148:217597-217598
>148:217597-217598
A-
$ samtools faidx core.full.aln 149:217597-217598
>149:217597-217598
AT
$ samtools faidx core.full.aln 198:217597-217598
>198:217597-217598
AT
I replicated your result with your test data on Snippy 4.6.0, and any idea to fix this? Thanks
@fredyr there is nothing attached to your comment, but i agree, it does look like there is a problem.
can you also check what is in snps.aligned.fa
at those positions for those 3 samples too?
Sure! (I was able to download the testdata.zip, from my comment, I'm not sure if it's only available for myself?). Here's the result:
$ samtools faidx 148/snps.aligned.fa
$ samtools faidx 149/snps.aligned.fa
$ samtools faidx 198/snps.aligned.fa
$ samtools faidx 148/snps.aligned.fa CP003351.1:217597-217598
>CP003351.1:217597-217598
AT
$ samtools faidx 149/snps.aligned.fa CP003351.1:217597-217598
>CP003351.1:217597-217598
AT
$ samtools faidx 198/snps.aligned.fa CP003351.1:217597-217598
>CP003351.1:217597-217598
AT
In any case, my previous comment about exiting the loop prematurely when writing the full alignment, seems to be the reason for this. For my own purposes I was able to work around it.
I confirm file is accessable
@fredyr sorry i didn't see the zip
link in the body - for some reason i was looking at the bottom as if it was an email attachment.
ah yes, i see the link to the early loop exit. That could be the problem.
@fredyr could please share your work-around as I suppose this bug would greatly affect the downstream recombination filtering step, which relies on gap characters, such as Gubbins. Many thanks
any update regarding this issue?