SURVIVOR icon indicating copy to clipboard operation
SURVIVOR copied to clipboard

SURVIVOR filtering variants during merge as being supported by zero callers

Open oneillkza opened this issue 4 years ago • 3 comments

Hi there,

I'm trying to use SURVIVOR to merge matching tumour and normal vcfs generated by sniffles from PromethION data. However, this seems to be erroneously losing variants during the merge. I've isolated one particular variant, which has extensive support in the tumour (9 reads), and which we know from previous work is a real variant in this cell line.

Minimal tumour.vcf (minus most of the header):

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	20190816_COLO829.fastq.mm2.sorted.bam
1	207981231	1293	N	<DEL>	.	PASS	PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=1;END=208014820;STD_quant_start=2.420153;STD_quant_stop=3.000000;Kurtosis_quant_start=-1.580012;Kurtosis_quant_stop=-0.370370;SVTYPE=DEL;RNAMES=3463dfad-992e-4068-891c-22215f043d06,5cbaae2b-62dc-45ab-a993-daaacf350847,a30fb0cc-bca6-4239-a926-9d3c954e4cc2,afdfd7be-0651-4099-880c-4385ceacd6af,cffe5e9c-5783-427c-a3ad-32728023be77,ecc28152-3598-4f18-8085-3bd146b919d7,f4c7b59b-2898-401b-8b5e-4a4924cb7bcd;SUPTYPE=SR;SVLEN=-33589;STRANDS=+-;RE=7;REF_strand=9,8;AF=0.291667	GT:DR:DV	0/0:17:7

Minimal normal.vcf:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	20190816_COLO829_BL.fastq.mm2.sorted.bam
12	95244024	13927_0	GATCTTATAACTAGAAAAACCTAAAGACTCCACCAAAAAACTCTTAGATCTGATAAATAAATTCAGTAAAGCTTCAGGTACAAAATCAACACACAAAAATCGGTAGCATTTCTATACACCAATAATGAACTTGCTGAGAAAGAAATCAAGAAGGCAATCCCATTTACAATAGCTATAAAAAATAGAATATCTAACAATAAATTTAACCAAGGAGGTTGTCTTAGTCCATTTGTGTAGCTACATCTGAGGCTGGGTAATCTATAAAGAAAAGAGGTTTATTTGGCTAATGGTTCTACAGGCTGTACAAGAAGCACAGCACCAATATCTGCTACTGGAGAGGGCTTCCCGGCTGCTTCTACTCATGGCAGAAGGAGAACGGGAGCTGTTGTATGCAGAGATCATATGGTGAGAGAGAGGAAGCAA	N	.	PASS	PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=12;END=95244447;STD_quant_start=0.447214;STD_quant_stop=0.000000;Kurtosis_quant_start=5.871518;Kurtosis_quant_stop=6.574891;SVTYPE=DEL;RNAMES=1b631bb1-ac12-40c8-b950-34368382ed47,1c541b4f-cd09-4d46-b226-5701711f0bbc,20bca30e-af02-40cc-a723-3e0bff5bc395,21faee21-34b9-4d58-b23c-924199c39a61,24ec0c78-fd77-4933-881a-4f1bc0351678,274a6148-f834-4ee9-b763-8171535a07ee,2ea2a668-4c94-4e95-aa0e-065e2b0f9076,41038240-1471-4c93-8a0b-c9569d185e30,69c904ed-e0cc-4013-8c95-bffe93ae1d31,6c1c8cc4-83d2-4a60-98f6-4a690b77dfc2,7f6d4019-3415-4ded-8089-d792ebc79ed5,a4a2b758-3b46-4a23-8719-40650570aedd,df9b5bc1-d5fb-4d76-bcb6-22bb3053a7ca,f48b0ecc-6a5f-4bb8-8fee-36cbc2eab577;SUPTYPE=AL;SVLEN=-423;STRANDS=+-;RE=14;REF_strand=2,3;AF=0.736842	GT:DR:DV	0/1:5:14

merged vcf (survivor_test.vcf):

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	20190816_COLO829_BL.fastq.mm2.sorted.bam20190816_COLO829.fastq.mm2.sorted.bam
1	207981231	1293	N	<DEL>	.	PASS	SUPP=0;SUPP_VEC=00;SVLEN=-33589;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.6;CHR2=1;END=208014820;CIPOS=0,0;CIEND=0,0;STRANDS=+-	GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO	./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN	0/0:NA:33589:17,7:+-:.:DEL:1293:NA:NA:1_207981231-1_208014820
12	95244024	13927_0	GATCTTATAACTAGAAAAACCTAAAGACTCCACCAAAAAACTCTTAGATCTGATAAATAAATTCAGTAAAGCTTCAGGTACAAAATCAACACACAAAAATCGGTAGCATTTCTATACACCAATAATGAACTTGCTGAGAAAGAAATCAAGAAGGCAATCCCATTTACAATAGCTATAAAAAATAGAATATCTAACAATAAATTTAACCAAGGAGGTTGTCTTAGTCCATTTGTGTAGCTACATCTGAGGCTGGGTAATCTATAAAGAAAAGAGGTTTATTTGGCTAATGGTTCTACAGGCTGTACAAGAAGCACAGCACCAATATCTGCTACTGGAGAGGGCTTCCCGGCTGCTTCTACTCATGGCAGAAGGAGAACGGGAGCTGTTGTATGCAGAGATCATATGGTGAGAGAGAGGAAGCAA	N	.	PASS	SUPP=1;SUPP_VEC=10;SVLEN=-423;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.6;CHR2=12;END=95244447;CIPOS=0,0;CIEND=0,0;STRANDS=+-	GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO	0/1:NA:423:5,14:+-:.:DEL:13927_0:GATCTTATAACTAGAAAAACCTAAAGACTCCACCAAAAAACTCTTAGATCTGATAAATAAATTCAGTAAAGCTTCAGGTACAAAATCAACACACAAAAATCGGTAGCATTTCTATACACCAATAATGAACTTGCTGAGAAAGAAATCAAGAAGGCAATCCCATTTACAATAGCTATAAAAAATAGAATATCTAACAATAAATTTAACCAAGGAGGTTGTCTTAGTCCATTTGTGTAGCTACATCTGAGGCTGGGTAATCTATAAAGAAAAGAGGTTTATTTGGCTAATGGTTCTACAGGCTGTACAAGAAGCACAGCACCAATATCTGCTACTGGAGAGGGCTTCCCGGCTGCTTCTACTCATGGCAGAAGGAGAACGGGAGCTGTTGTATGCAGAGATCATATGGTGAGAGAGAGGAAGCAA:N:12_95244024-12_95244447	./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN

SURVIVOR command:

SURVIVOR merge survivor_mini_test.txt 1000 0 0 0 0 50 survivor_test.vcf

Note how the tumour variant has SUPP_VEC=00. I had to set the num_callers parameter to 0 to get it to be included.

oneillkza avatar Feb 14 '20 21:02 oneillkza

And, now that I think about it, I see the sentence "NOTE ./. or 0/0 is not counted as supporting a variant." in the docs. Since this is a subclonal heterozygous tumour variant, the allele frequency is less than 0.5, and Sniffles called the genotype as 0/0.

It might be helpful to have something in the documentation noting that num_callers can be set to zero to include variants like these.

oneillkza avatar Feb 14 '20 21:02 oneillkza

Thanks for reaching out. This is indeed a point that I am also not sure what would be the smartest way. For de novo calls like yours 0/0 could be taken into account . For force calling (genotyping of known svs) a 0/0 should not be taken into account.

I will try to highlight this better. Thanks Fritz

fritzsedlazeck avatar Feb 14 '20 21:02 fritzsedlazeck

Thanks for the response! (And for providing the tool in the first place).

oneillkza avatar Feb 14 '20 21:02 oneillkza