SURVIVOR icon indicating copy to clipboard operation
SURVIVOR copied to clipboard

behavior of SURVIVOR merge on multiple samples from multiple callers

Open danrlu opened this issue 4 years ago • 0 comments

Thank you for making such a useful tool!!

We have multiple samples and each sample have multiple vcfs generated by different SV callers. Based on discussion in #95, we should do the following:

STEP 1: merge all vcfs for the same sample into 1 vcf per sample with SURVIVOR merge

The result is a multi-column vcf, with headers copied from the individual vcfs:

Resulting vcf for Sample1:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample1_manta	Sample1_delly	Sample1_smoove
I	20105	101	N	<DEL>	180	PASS	SUPP=2;SUPP_VEC=101;SVLEN=95;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.7;CHR2=I;END=20163;CIPOS=0,819;CIEND=0,892;STRANDS=+-	GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO	0/1:NA:131:10,0:--:180:INV:INV00000001:NA:NA:I_20924-I_21055	./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN	0/0:NA:58:0,7:+-:0:DEL:101:NA:NA:I_20105-I_20163

For Sample2:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample2_manta	Sample2_delly	Sample2_smoove
I	20105	DEL000SUR	AAATTTTTTTTCCGCAAAATCAGGAAAAATTCAGAAAAAGACAGTCAAAAAATTGTAGA	ATC	601	PASS	SUPP=3;SUPP_VEC=111;SVLEN=82;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.7;CHR2=I;END=20163;CIPOS=0,819;CIEND=0,892;STRANDS=+-	GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO	0/0:NA:131:11,0:--:180:INV:INV00000001:NA:NA:I_20924-I_21055	1/1:NA:58:0,18:+-:601:DEL:I_20105_20163_-58:AAATTTTTTTTCCGCAAAATCAGGAAAAATTCAGAAAAAGACAGTCAAAAAATTGTAGA:ATC:I_20105-I_20163	1/1:NA:58:0,7:+-:354:DEL:101:NA:NA:I_20105-I_20163

STEP 2: combine the 1 vcf per sample for all samples with SURVIVOR merge

The headers used the first headers in each of the vcfs above, but the fields seems re-computed combining columns in each input vcf.

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample1_manta	Sample2_manta
I	20105	DEL000SUR	AAATTTTTTTTCCGCAAAATCAGGAAAAATTCAGAAAAAGACAGTCAAAAAATTGTAGA	ATC	601	PASS	SUPP=2;SUPP_VEC=11;SVLEN=-89;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.7;CHR2=I;END=20163;CIPOS=0,0;CIEND=0,0;STRANDS=+-	GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO	0/1:101:95:0,0:+-:180:DEL:101:NA:NA:I_20105-I_20163	1/1:111:82:0,0:+-:601:DEL:DEL000SUR:AAATTTTTTTTCCGCAAAATCAGGAAAAATTCAGAAAAAGACAGTCAAAAAATTGTAGA:ATC:I_20105-I_20163

I was a bit confused by the discussion in #127

no SURVIVOR does not take the GT into account as many tools often dont report the GT.

This was referring to STEP 1, right? In which case the GT field was simply copied over from input vcfs. Whereas in STEP 2 it looks like the GT most different from REF was kept while merging calls for the same sample (1/1 > 1/0 > 0/0)?

Option was SURVIVOR merge ... 1000 1 0 0 1 30, and version is 1.0.7 from bioconda.

Thanks! Dan

danrlu avatar Oct 19 '20 02:10 danrlu