snippy icon indicating copy to clipboard operation
snippy copied to clipboard

Deletions in core.full.aln

Open pkmitchell opened this issue 4 years ago • 12 comments

Hello,

I ran snippy-multi on a set of 24 samples. According to the log output and the snps.tab file for each sample, 13 of them had a deletion at the same site. However, in the core.full.aln file, only 1 has a "-" at that position while the other 12 are the same as the reference. Any idea why this might be happening?

pkmitchell avatar Nov 12 '19 15:11 pkmitchell

  1. What does snippy --version say?
  2. Did you use any unusual command line arguments?

tseemann avatar Nov 13 '19 02:11 tseemann

Version 4.3.6. Only command line arguments being used with the snippy-multi command were the input reads, the reference file, and the number of CPUs

pkmitchell avatar Nov 13 '19 20:11 pkmitchell

This could be a bug. Are you able to email me the *.tab, *.vcf files?

tseemann avatar Nov 13 '19 20:11 tseemann

Hello, Could this behaviour come from the early bail-out on https://github.com/tseemann/snippy/blob/master/bin/snippy-core#L282

That would be consistent with the described behaviour if all samples (for my $id (@id) loop) have $alt == '-' but the patching of the full alignment (is on the row above) only updates the first and then skips to the next position.

fredyr avatar Apr 15 '20 18:04 fredyr

I think I managed to create a small(-ish) test case to verify this behaviour. See the attached file with the data (public data off of ENA/EBI)

$ snippy --version
snippy 4.3.6

All three samples have the same first deletion at position 217597 (well more specifically the del is 217598). testdata.zip

$ grep 'TYPE=del' 148/snps.vcf | head -n1
CP003351.1	217597	.	AT	A	6320.52	.	AB=0;AO=200;DP=200;QA=7080;QR=0;RO=0;TYPE=del	GT:DP:RO:QR:AO:QA:GL	1/1:200:0:0:200:7080:-635.929,-60.206,0
$ grep 'TYPE=del' 149/snps.vcf | head -n1
CP003351.1	217597	.	AT	A	5224.22	.	AB=0;AO=166;DP=166;QA=5862;QR=0;RO=0;TYPE=del	GT:DP:RO:QR:AO:QA:GL	1/1:166:0:0:166:5862:-526.617,-49.971,0
$ grep 'TYPE=del' 198/snps.vcf | head -n1
CP003351.1	217597	.	AT	A	4223.79	.	AB=0;AO=132;DP=133;QA=4757;QR=0;RO=0;TYPE=del	GT:DP:RO:QR:AO:QA:GL	1/1:133:0:0:132:4757:-427.365,-39.736,0

$ snippy-core --ref CP003351.1.fa 148 149 198

$ samtools faidx core.full.aln
$ samtools faidx core.full.aln 148:217597-217598
>148:217597-217598
A-
$ samtools faidx core.full.aln 149:217597-217598
>149:217597-217598
AT
$ samtools faidx core.full.aln 198:217597-217598
>198:217597-217598
AT

fredyr avatar Apr 16 '20 05:04 fredyr

I replicated your result with your test data on Snippy 4.6.0, and any idea to fix this? Thanks

fengyuchengdu avatar Jun 04 '20 03:06 fengyuchengdu

@fredyr there is nothing attached to your comment, but i agree, it does look like there is a problem. can you also check what is in snps.aligned.fa at those positions for those 3 samples too?

tseemann avatar Jun 04 '20 05:06 tseemann

Sure! (I was able to download the testdata.zip, from my comment, I'm not sure if it's only available for myself?). Here's the result:

$ samtools faidx 148/snps.aligned.fa
$ samtools faidx 149/snps.aligned.fa
$ samtools faidx 198/snps.aligned.fa
$ samtools faidx 148/snps.aligned.fa CP003351.1:217597-217598
>CP003351.1:217597-217598
AT
$ samtools faidx 149/snps.aligned.fa CP003351.1:217597-217598
>CP003351.1:217597-217598
AT
$ samtools faidx 198/snps.aligned.fa CP003351.1:217597-217598
>CP003351.1:217597-217598
AT

In any case, my previous comment about exiting the loop prematurely when writing the full alignment, seems to be the reason for this. For my own purposes I was able to work around it.

fredyr avatar Jun 04 '20 05:06 fredyr

I confirm file is accessable

maesaar avatar Jun 04 '20 05:06 maesaar

@fredyr sorry i didn't see the zip link in the body - for some reason i was looking at the bottom as if it was an email attachment.

ah yes, i see the link to the early loop exit. That could be the problem.

tseemann avatar Jun 04 '20 07:06 tseemann

@fredyr could please share your work-around as I suppose this bug would greatly affect the downstream recombination filtering step, which relies on gap characters, such as Gubbins. Many thanks

fengyuchengdu avatar Aug 30 '20 01:08 fengyuchengdu

any update regarding this issue?

fengyuchengdu avatar May 18 '22 02:05 fengyuchengdu